1. Introduction
Serverless computing promises rapid development and deployment of applications and services [
1,
2]. Instead of purchasing and managing backend infrastructure, service providers can focus on developing application logic, while cloud providers handle resource provisioning, service deployment, hardware fault tolerance, and etc. Additionally, serverless platforms offer high availability and automatic scaling, enhancing service robustness. The serverless market is already substantial—nearly
$8 billion—and is projected to exceed
$20 billion by the end of 2026 [
3]. As one CEO said, “Serverless has demonstrated that it is the operational model of the future” [
4].
Existing serverless platforms largely follow a
disaggregated approach [
5,
6,
7,
8,
9,
10,
11,
12,
13]. For example, consider AWS Lambda, the most popular serverless platform today. To deploy a service, the application developers upload custom executables or scripts to AWS Lambda, which then execute them in virtualized environments upon user invocations [
14]. However, these virtualized environments lack mechanisms to store data persistently. To reliably store application data for the desired functionality, developers must rely on external storage services such as Amazon S3 [
15].
The separation of compute and storage enables independent scaling of each component and works well for certain non-interactive applications. However, service providers quickly encounter challenges when building more complex, data-intensive applications [
16,
17]. Recent surveys have found that many serverless functions run for only a few seconds, are invoked infrequently, and operate on datasets smaller than 10 Mb [
18,
19]. These use cases highlight the limitations of current platforms: serverless execution in its current form suffers from degraded performance under these workloads due to high latency interacting with external storage systems [
20].
In addition, most serverless platforms provide
at-least-once semantics and lack transactional guarantees. The serverless functions may execute more than once on a single request, and partial results of an ongoing execution may be observed by other function calls. Consequently, serverless platforms often require functions to be idempotent to accommodate these semantics [
21], and they place the burden of concurrency control on the application itself if needed—directly contradicting one of the core tenets of serverless systems: reducing complexity for the application developer.
To address these challenges, we identify six desirable properties for modern serverless platforms, which we abbreviate as
RAISED: Responsiveness, Atomicity, Isolation granularity, Serializable workflows, Elasticity, and Durable storage. Based on these properties, we introduce
LambdaStore: a serverless execution system with an integrated storage engine designed for low-latency cloud applications. In
LambdaStore, serverless functions execute close to their persisted data, reducing storage access cost, while the storage engine provides transactional guarantees for the function executions. Such a co-design aligns with the end-to-end argument [
22]: rather than enforcing consistency and fault tolerance separately at the storage and compute layers, a unified architecture significantly reduces the overhead associated with replication and concurrency control.
The design of
LambdaStore addresses three key challenges: identifying the affinity between serverless functions and application data to enable compute–storage colocation, efficiently supporting transactional workflows, enabling colocation without sacrificing the elasticity of the stateless model. To discover opportunities for colocation,
LambdaStore introduces
LambdaObjects, an object-oriented data model that associates functions with their application data (
Section 3). To support transactional workflows,
LambdaStore embeds a transactional key-value store and encapsulates each workflow within its own transaction (
Section 4.3). This storage layer is tailored for serverless workloads, minimizing transaction conflict rates. To preserve the scalability and elasticity essential in cloud environments,
LambdaStore leverages two mechanisms: light replication (
Section 4.4.2) and object migration (
Section 4.4.3). It employs microsharding (
Section 4.4.1) to efficiently migrate or replicate individual objects, enabling rapid adaptation to changing workloads.
LambdaStore outperforms the conventional disaggregated serverless design in both throughput and latency while providing stronger guarantees. We compare it against OpenWhisk [
5], OpenLambda [
23], Faasm [
11], and Apiary [
24] using microbenchmarks to demonstrate its performance benefits. Our experimental results show that, compared with other serverless systems based on WebAssembly,
LambdaStore is able to deliver orders of magnitude higher throughput while maintaining a low end-to-end latency for stateful workloads. We also evaluate the systems with two applications—an online message board and a Twitter-like microblog—to demonstrate its scalability. We observe that
LambdaStore scales almost linearly with the number of nodes. With a cluster of 36 machines, it can process up to 600 k transactions per second.
In this work, we make the following contributions:
We identify six desirable properties for modern serverless platforms, summarized as RAISED: Responsiveness, Atomicity, Isolation granularity, Serializable workflows, Elasticity, and Durable storage.
We present the LambdaObjects abstraction that associates serverless functions with their application data to enable compute–storage colocation.
We present the design and implementation of LambdaStore, a serverless execution system with an integrated storage layer designed for serverless workloads. With the LambdaObjects abstraction, LambdaStore achieves RAISED properties.
We evaluate LambdaStore with microbenchmarks and application workloads, demonstrating its performance and scalability benefits over existing serverless platforms.
The rest of the paper is organized as follows. We first provide background on serverless computing and introduce RAISED properties (
Section 2). We then explain the
LambdaObjects abstraction with a sample application (
Section 3). Following, we present the design and implementation of
LambdaStore (
Section 4). We evaluate
LambdaStore with microbenchmarks and application workloads to demonstrate the effectiveness of our approach (
Section 5). Next, we discuss limitations in the current prototype and their mitigations as future directions (
Section 6). Finally, we discuss related works (
Section 7) and conclude (
Section 8).
2. Background and Motivation
Serverless platforms allow mutually distrusting applications to share cloud infrastructure through virtualization and dynamic resource assignment. Virtualization isolates individual running serverless function instances (or serverless jobs) from one another, ensuring that jobs fail independently and do not compete for resources. This isolation provides fault resilience, allowing multiple applications to safely colocate on the same hardware. Dynamic resource allocation, in turn, enables the (re-)allocation of resources to serverless jobs at runtime. It improves resource efficiency and supports scalability by adjusting resource allocations based on each application’s current workload.
Serverless systems provide an abstraction that allows developers to build and deploy services without provisioning or managing machines. In this model, applications are implemented as a set of functions, and a cloud provider (e.g., AWS Lambda, Microsoft Azure Functions, Google Cloud Functions) manages their instantiation and execution. Each job can invoke other serverless functions, forming a serverless
workflow. In some system designs, each job performs a small task, and workflows consist of many such jobs [
25]. However, the complexity and runtime of a job can vary depending on the application.
RAISED
Our goal is to design a cloud computing infrastructure suitable for stateful applications. After evaluating existing platforms, we identified six desirable properties for modern cloud computing systems, which we abbreviate as RAISED:
Responsiveness: Serverless applications should be able to respond to user requests quickly, even during cold starts.
Atomicity: The cloud platform should guarantee exactly-once execution semantics to simplify application logic.
Isolation Granularity: The platform should offer fine-grained isolation to ensure that jobs fail independently.
Serializable Workflows: For applications requiring strong consistency, the platform should ensure that workflows are serializable and that applications always observe a consistent state.
Elasticity: The platform should dynamically adjust resource allocations based on current workloads.
Durable Storage: The platform must be able to reliably persist application data.
A conventional stateful serverless system architecture typically consists of three components: a coordination layer, a compute layer, and a storage layer. We describe this architecture using OpenWhisk [
5] as a representative example. OpenWhisk relies on Apache Kafka [
26] to track outstanding jobs, which are then delegated to a compute layer (e.g., Kubernetes [
27]). Serverless applications interact with a dedicated storage layer, such as a DBMS or key-value store, to persist state across function invocations.
This disaggregated compute–storage architecture enables high elasticity, as scaling a stateless compute layer is straightforward. However, it also incurs frequent data transmissions, leading to high latencies and reduced responsiveness. For example, in a simple read-modify-write workload, a job must first fetch data from the remote storage system, perform the update, and then write the result back. While the update itself may take only a few microseconds, the data fetch and store operations dominate latency. This problem is worsened during cold starts, which require establishing new TCP connections between the compute and storage layers, and by the high software overhead imposed by the storage system on every data access.
Furthermore, because most serverless platforms do not guarantee exactly-once execution semantics, they cannot provide transactional guarantees at the workflow level—even if the storage layer supports transactions. This is due to the lack of coordination: serverless functions must explicitly commit transactions. If a transaction is committed but the workflow fails and is retried, the ACID properties can be violated. As a result, conventional cloud computing platforms, including OpenWhisk, OpenLambda [
23], AWS Lambda [
10], Google Cloud Functions [
6], and Azure Functions [
9], suffer from these limitations.
Some research efforts [
7,
8,
11,
12,
13,
28,
29,
30,
31,
32] seek to reduce state access latency by caching persistent state or storing intermediate state locally. Jobs within a workflow that share state can benefit significantly from this approach, as the platform can improve data locality by colocating them on the same machine, avoiding repeated data transmission over the network. Faasm, in particular, further improves responsiveness by adopting
software-fault isolation (SFI) via
WebAssembly to reduce cold-start latency. However, this compute-centric design still relies on remote storage for data persistence. Caching does not improve cold-start performance, since the state is not yet in cache, and it introduces complexity in maintaining consistency across the storage layer.
Other research efforts [
24,
33,
34,
35] propose shipping serverless jobs (or subunits of them) to the storage layer to enable colocation. Apiary, for example, leverages the user defined functions (UDFs) in database systems to achieve colocation through query optimization. This approach not only reduces the end-to-end latency, improving responsiveness, but also ensures serializability and atomicity. Nonetheless, a naïve colocation design will restrict the elasticity of serverless systems: forcing jobs to run near the data constrains the amount of compute resources they can utilize, as the resources available near the data are limited by the capacity of the physical machines storing that data. Apiary’s approach additionally introduces problems with fault isolation: UDF failures such as out-of-memory and segmentation faults can crash the local database. Stored procedures expose a huge attack surface as they are not designed to run untrusted code from end users.
Table 1 summarizes whether different systems meet each of the RAISED properties. While existing serverless platforms and research prototypes have made significant strides in improving elasticity, responsiveness, and transactional support, they often face trade-offs between colocation and scalability, or between performance and durability. These limitations highlight the need for a new system architecture that unifies compute and storage without sacrificing the core benefits of serverless computing. In this work, we present
LambdaStore, a serverless execution platform designed to meet the RAISED properties by tightly integrating compute and storage, enabling transactional, low-latency, and scalable execution for stateful cloud applications.
3. The LambdaObjects Abstraction
LambdaStore is a cloud computing system built on a compute–storage co-design. It organizes both data and execution around stateful objects, which we refer to as LambdaObjects. This section introduces the LambdaObjects abstraction and illustrates its effectiveness through a concrete example.
3.1. Data Model
3.1.1. Objects
Applications in
LambdaStore are defined through
Object Types, which specify both the executable code (
functions) and the data entries (
fields) that each object contains.
Objects can then be instantiated from these types, similar to the class-object relationship in object-oriented programming—a paradigm familiar to many developers. By encapsulating both logic and data within objects,
LambdaStore provides an intuitive way to decompose applications into small, independent components, akin to the structure of microservices [
36]. Moreover,
LambdaStore leverages this object abstraction to determine data placement and to colocate function execution with the corresponding data. The object model is designed to be highly flexible, supporting multiple programming languages, variable object sizes, and application-specific data types.
3.1.2. Applications
Each object belongs to an application. The application developer is billed for both the storage of the application’s data and the execution of its functions. Developers define which object functions are part of the application’s public API and which are restricted to internal use. Currently, objects can invoke functions only within the same application. We reserve sharing data and code across application boundaries and more fine-grained access control for future work.
3.1.3. Object Entries
Entries form the smallest unit of data in LambdaStore and are stored as (part of) a field within a specific object. Data fields then define how entries are accessed, stored, and indexed for lookup. LambdaStore currently supports three field types: maps, multi-maps, and cells (unstructured data). Other data types can be implemented in application code. For example, a set can be built on top of the map primitive.
Each entry is a key-value pair that is stored and replicated by LambdaStore and is associated with a specific data field of an object. For instance, an entry might represent a single item in an object’s map or the full content of a cell. The content and semantics of each entry are application-specific and opaque to LambdaStore. Functions interact with entries through a minimal API that supports reading, writing, and basic range queries. Application code is responsible for (de-)serialization of data and to provide more complex data operations (e.g., increment or append).
3.2. Execution Model
3.2.1. Function Calls
Application logic in LambdaStore executes in the form of function calls (or jobs). A job represents the invocation of a specific function. As in conventional serverless systems, more complex application logic can be constructed by composing multiple jobs into a workflow. The structure of the job graph within a workflow is not predefined; instead, it is generated dynamically as functions execute.
Functions in LambdaStore come in several variants, similar to those in object-oriented programming. Constructors are used to create and initialize new objects. Methods operate on existing objects and can access or modify their fields. Static functions are not associated with any specific object and hence do not have direct access to object data.
In addition, LambdaStore provides a mechanism for operations that involve many objects: map calls. For example, consider the task of finding the maximum age among all clients in an application, where each client is represented by a dedicated object. Executing a separate function on each individual object would be highly inefficient. Instead, a map call takes a set of objects O and a function f, and executes at most one instance of f per shard. Each invocation of f then iterates over the subset of O that is located on that shard. This mechanism enables efficient batch processing without incurring additional data movement. The results of the map call are aggregated and returned to the caller.
Function calls have direct access only to the data of the object(s) they are associated with. To access or modify data in other objects, functions must either invoke constructors or methods on those objects or use map calls. This pattern minimizes data movement.
Functions in LambdaStore are exposed to a minimal API upon which more complex application logic can be built. This API allows functions to read and write object entries, invoke other functions, manage user sessions, retrieve function arguments, set return values, obtain the current time, and generate randomness. Similar to system calls in an operating system, which only provide low-level functionality, higher-level abstractions are then implemented in user space (or in our context, within the serverless functions).
Functions execute until they either complete successfully or are terminated by the runtime. Termination can occur due to fatal errors (e.g., stack overflows), violations of security policies (e.g., attempting to access data outside the application’s scope), or exceeding the maximum allowed execution time.
3.2.2. Transactions
LambdaStore guarantees strict serializability across the entire workflow. A workflow can be represented as a directed acyclic graph (DAG)—more specifically, as a directed rooted tree—of function calls. In this graph, each vertex corresponds to a job, and each edge represents a function call that spawns another job. Workflows are initiated by clients invoking a public function, which forms the root of the DAG. As functions call other functions, the DAG is dynamically extended at runtime. These execution graphs are not predefined by the application developer but are constructed automatically during execution. Developers can inspect a workflow’s structure by examining its execution trace.
Currently, individual function calls in LambdaStore execute sequentially, but workflows can execute multiple jobs concurrently. LambdaStore allows a function to issue multiple function calls simultaneously and wait for all of them to complete. For example, in a social network application, a function that creates a user’s post might invoke separate functions for each follower to update their respective timelines. While the post creation itself executes sequentially, the timeline updates can proceed in parallel.
Each workflow in LambdaObjects is contained within a single LambdaStore transaction. Throughout this paper, we use the terms workflow and transaction interchangeably when referring to LambdaStore.
3.3. Example Application
LambdaObjects support arbitrary applications. To demonstrate its flexibility, we present an example application: CloudForum, an online discussion board similar to Reddit. This section outlines part of its implementation in Rust, with some simplifications made for clarity and space constraints.
Listing 1 shows how object types are declared within an application. In this example, we outline three types.
Accounts represent users of the online forum and contain references to all threads and comments they have created.
Threads store the initial post by the thread creator, along with a sequence of comments. Comments are stored using a custom structure called
Comment, which is not an object type itself but defines how data within the
comments field is structured. Each comment can be identified through the thread ID and its index within the thread. Finally, threads are indexed using the
Community type.
| Listing 1. Object types for an online forum application. |
#[lambda_object] struct Account { name: Cell<String>, threads: Set<ObjectId>, comments: Set<(ObjectId,u32)>, } #[lambda_object] struct Community { by_name: MultiMap<String, ObjectId>, by_time: MultiMap<u64, IndexEntry>, } #[lambda_object] struct Thread { author_name: Cell<String>, community_id: Cell<ObjectId>, author_id: Cell<ObjectId>, title: Cell<String>, text: Cell<String>, comment_count: Cell<u32>, comments: Map<u32, Comment>, }
|
Listing 2 illustrates the workflow for adding a new comment to an existing thread. The implementation uses LambdaStore’s Rust bindings, which abstract away much of the boilerplate, such as data serialization. Users initiate the workflow by invoking the create_thread method on an object of type Account. Arguments, such as the content of the comments, are passed in a JSON document.
The function first authenticates the user (omitted for brevity) and looks up the user’s identifier and name. It then calls the add_comment method on the specific thread where the comment should be added. This method executes in a separate, dedicated job. By default, invoked function calls run in the background unless the parent job explicitly calls join, as shown in the example. This API enables concurrency similar to async/await pattern found in many programming languages: multiple child jobs can be spawned and waited on simultaneously.
The
add_comment method then stores the comment as part of the
Thread object. It attaches a timestamp to the comment using the
get_unix_time host call. Finally, it returns the comment’s identifier, which the
Account object stores in its
comments field.
| Listing 2. Implementation of CloudForum’s comment functionality. Low-level API calls are abstracted behind a higher-level interface. Access control and error handling logic are omitted for brevity. |
#[lambda_functions] impl Account {
fn create_comment(&self, app: Application, args: json::Value) { args.set("author_id",self.get_identifier()); args.set("author_name",self.name.get());
let thread_id = args.get("thread_id"); let result = app.get_object() .call_json("add_comment", &args) .join();
// Store a reference to the comment let comment_id = result.get("comment_id"); self.comments() .insert(&(thread_id, comment_id)); } }
#[lambda_functions] impl Thread {
#[protected] fn add_comment(&self, args: json::Value) { let comment = Comment { author_id: args.get("author_id"), author_name: args.get("author_name"), text: args.get("text"), time: get_unix_time(), };
// Increase the total comment count let comment_cnt = self.comment_count().get(); let comment_id: u32 = comment_cnt + 1; self.comment_count().put(&comment_id);
// Store the comment in the thread’s object self.comments().put(&comment_id, &comment); set_json_result({"comment_id": comment_id}); } }
|
4. LambdaStore
This section presents the design of LambdaStore in detail. We begin by outlining the overall system architecture. Next, we discuss the virtualization technique employed by LambdaStore. We then describe how LambdaStore efficiently provides transactional guarantees for workflows and how it enables compute–storage colocation while preserving elasticity. Finally, we describe fault tolerance and recovery mechanisms.
4.1. System Architecture
LambdaStore consists of four types of participants: clients, frontends, worker nodes, and coordinating nodes.
Figure 1 sketches how these components interact within the system.
4.1.1. Worker Nodes and Replica Sets
Worker nodes execute serverless jobs and store LambdaObjects. They are organized into replica sets of nodes. Application data are sharded, and each shard is assigned to a replica set. Nodes within a replica set run replication protocols to ensure data consistency and durability.
Requests for LambdaObjects methods are first sent to the primary node of the replica set that stores the associated object. The primary determines where the method should be executed. Under light workloads, it is most efficient to execute the job directly on the primary. Under heavy workloads, however, the primary can delegate the job to other nodes in the replica set to avoid becoming a bottleneck.
Within each replica set,
LambdaStore uses chain replication [
38] to replicate data across nodes. As the name suggests, nodes in the set are arranged in a chain, with the head serving as the primary and the remaining nodes acting as
secondary replicas. During state changes, the primary communicates the update to only one secondary replica, which then propagates it down the chain. With a replica set of
nodes,
LambdaStore can tolerate up to
f simultaneous node failures without losing data. Compared with other replication algorithms, chain replication significantly reduces the load on the primary, enabling high scalability and elasticity even under skewed workloads.
4.1.2. The Coordinating Service
The coordinating service (or coordinator) is responsible for maintaining all metadata. It tracks all participants in the cluster, the configuration of individual shards, and the placement of objects within those shards. When worker nodes, frontends, or clients join the network, they first connect to the coordinating service, which informs them about other worker nodes in the system and the current object placements. In addition, the coordinating service monitors the health and workload status of each worker. It can initiate object migration or create light replicas accordingly to provide elasticity. In our prototype, we only demonstrate the effectiveness of object migration and light replication with manual triggers. We leave automatic elastic store management policies for future exploration.
For reliability, the coordinating service should ideally use a distributed metadata management system such as ZooKeeper [
39]. In our prototype, however, the coordinator is not replicated for simplicity. We argue that this does not significantly impact system performance, as the coordinating service is not involved in most requests. Transactions only reach the coordinator if they create new objects or when there is an ongoing reconfiguration of the cluster. The former can be optimized: workers can asynchronously notify the coordinator of new objects after the transaction commits, and the coordinators can fetch new object placements from the workers when needed. The latter only occurs when the system performs an elastic storage decision (
Section 4.4) or during failures (
Section 4.5).
4.1.3. Clients and Frontends
To avoid accessing the coordinator on every request, clients maintain local caches of system configuration and object placements. However, a large number of short-lived clients can easily overwhelm the coordinating service. To mitigate this issue, LambdaStore introduces Frontends, which maintain up-to-date caches of the cluster configuration. Clients send their requests to a Frontend, which then redirects the request to the appropriate primary worker node based on its cached view.
4.2. Virtualization Layer
LambdaStore enables the execution of untrusted computation through the use of WebAssembly (or
WASM) [
40]. We chose WebAssembly as the virtualization mechanism due to its significantly lower overhead compared to alternatives such as virtual machines (VMs) or containers. Once an application developer compiles their code into WebAssembly, it is registered with the coordinator. The coordinator then compiles the WebAssembly instructions to machine code, injects additional safeguards to protect against misbehaving programs, and distributes the resulting code to all storage nodes.
Upon invocation, nodes directly embed the generated code into their address space. To start a job, a node allocates memory using mmap and performs a context switch by saving register contents and setting the program counter to a location within the function’s code. The function interacts with the host environment via a predefined set of API calls, each of which triggers a context switch back to the storage layer, analogous to how system calls transition from user space to kernel space.
Threat Model
As a serverless platform, LambdaStore must execute untrusted code provided by application developers. WebAssembly offers software-based fault isolation by enforcing memory safety through mechanisms such as bounds checking on memory accesses and control-flow integrity. It also prevents memory leakage between different functions with zero-initialized memory and protects the kernel by disabling system calls.
WebAssembly, however, does not protect against non-terminating functions. LambdaStore uses periodic timer interrupts to abort indefinitely running lambda jobs. When an interrupt occurs, a trap handler checks whether the current function has exceeded its maximum execution time and aborts it if necessary. Tracking function execution time is also essential for implementing an accurate billing model in a cloud environment.
In addition, WebAssembly does not provide resource isolation, and
LambdaStore is prone to many side-channel attacks. We elaborate on providing resource isolation in
Section 6.
4.3. Serverless Transactions
The storage layer in LambdaStore provides a transactional interface. LambdaStore enforces atomicity and strict serializability by encapsulating all data accesses within a workflow into a single transaction. The transaction is committed when the workflow completes. If the commit fails, LambdaStore can re-execute the workflow to ensure that its effects are externalized exactly once. This guarantee holds under the assumption that workflows do not produce side effects on external services outside of LambdaStore’s control. Supporting exactly-once semantics for functions that interact with external services remains an open challenge and is left to future work.
Providing transactional guarantees efficiently under serverless workloads is challenging. Since developers can define arbitrary application logic, workflows may vary significantly in duration and read-write patterns, ranging from short-running to long-running executions. Running such workflows concurrently can result in high conflict rates and, in some cases, starvation. Additionally, the elastic nature of serverless platforms allows a large number of workflows to execute simultaneously under heavy workloads, further exacerbating contention.
LambdaStore addresses these challenges by dynamically adjusting lock granularity through entry sets and by employing a variant of Silo’s optimistic concurrency control protocol [
41] with two phase commit (2PC).
4.3.1. Entry Sets
LambdaStore tracks metadata (e.g., locks and version numbers) for each object to support concurrency control. A naïve approach might maintain a single set of metadata per object, which can be too coarse-grained for large objects, or one set per entry, which introduces excessive bookkeeping overhead. Instead,
LambdaStore uses
entry sets as the unit of locking and version control. Each object defines a series of
guards that determine the boundaries between entry sets.
Figure 2 illustrates an example of such a partitioning. Entry sets allow
LambdaStore to reduce transaction conflicts by dynamically adjusting the lock granularity. Rather than locking entire objects, transactions only acquire locks on the entry sets they access. In the example shown in
Figure 2, one transaction can update the account name while another concurrently creates new threads without conflict.
In
LambdaStore, objects initially consist of a single entry set. With some probability, a write to a key will insert a new guard at that key’s position, effectively splitting the entry set in half. The probability of inserting a guard increases for entry sets with higher lock contention. As a result, objects that are written to more frequently or experience more conflicts are more likely to accumulate more guards and be partitioned into more entry sets. The intuition behind this design is that only write-heavy or highly contended workloads benefit from finer lock granularity. A similar approach has proven effective in the context of Log-Structured Merge Trees [
42]. Because entry sets can only be split during writes, splitting occurs exclusively during transaction commits, when the entry set is already write-locked and inaccessible to other transactions. After the split, the resulting entry sets are assigned version numbers higher than that of the original set. This mechanism ensures that the concurrency control protocol detects and handles the change appropriately. Consequently, entry set splitting does not interfere with function execution or violate system consistency.
4.3.2. Concurrency Control
Because serverless workflows can run for arbitrary durations, using pessimistic locking may lead to starvation, where a single long-running workflow blocks all other concurrent ones. To avoid this problem,
LambdaStore uses a variant of Silo’s optimistic concurrency control protocol [
41] to enforce ACID properties within each shard. Algorithm 1 outlines the protocol. During the execution phase, instead of atomically reading both the value and its version number,
LambdaStore leverages the atomic interface of its key-value storage backend to allow transactions to read a slightly stale version number. Specifically, during reads, each transaction first atomically reads the version number of the corresponding entry set, then queries the storage backend for the value. During the commit phase, the transaction must write each updated value back and then atomically update its version number. This design enables reads to proceed without acquiring locks during the execution and prepare phases. We provide a correctness proof in
Appendix A.1. The core intuition is that if a transaction reads a mismatched version and value during execution, then either (1) the version number it reads during the prepare phase will differ from the one read during execution, or (2) another concurrent transaction will be updating the entry set, which will thereby be write-locked. In both cases, the transaction will abort, ensuring serializability. Within each replica set, secondary nodes can execute jobs and read from their local storage during the execution phase. However, for each workflow, the secondary must send its read set and write set to the primary node, which is responsible for handling the prepare and commit phases.
| Algorithm 1: The Transaction Protocol of LambdaStore |
![Software 05 00005 i001 Software 05 00005 i001]() |
LambdaStore supports cross-shard transactions by coordinating its concurrency control protocol with two-phase commit (2PC). The worker node that initiates the workflow acts as the transaction manager and is responsible for managing the 2PC process. When a workflow begins, it is associated with a transaction. Jobs spawned by other jobs inherit the transaction context of their caller. Upon completion, each job returns its output to the calling job, along with metadata about the transaction including the identifiers of any additional worker nodes involved and whether any new objects were created. This information is propagated recursively until the initial job (the root of the workflow’s DAG) completes. At that point, the transaction manager begins the 2PC protocol by signaling all involved nodes to enter the prepare phase and report their results. If all nodes agree to commit, the transaction manager then sends a final signal instructing all nodes to commit the transaction.
4.4. Elasticity Storage Service
To colocate compute with storage while preserving the elasticity of serverless platforms, the storage layer must be able to adapt to changing workloads. LambdaStore achieves elasticity through two mechanisms: object migration and light replication. To minimize the overhead associated with both, LambdaStore employs microsharding, allowing the system to respond quickly to workload fluctuations.
4.4.1. Microsharding
LambdaStore treats objects as
microshards [
43], ensuring that each object and all of its associated data are mapped to a single shard. This fine-grained approach enables
LambdaStore to quickly respond to workload changes by migrating or creating lightweight replicas of individual objects. While the exact placement policy is beyond the scope of this paper,
LambdaStore generally aims to map objects belonging to the same application to as few replica sets as possible in order to maximize data locality.
The coordinator manages microshard assignments by maintaining the mapping from nodes to shards and from objects to shards. Other participants fetch object locations from the coordinator as needed and cache them locally. Node assignments within shards are modified only during failures or when the overall cluster size changes. The coordinator maintains a persistent TCP connection with each node and interprets connection termination as a node failure. Upon detecting a failure, it reconfigures the affected replica set by promoting the next node in the chain to primary and appending a new secondary node to the end of the chain. In contrast to node assignments, the object-to-shard mapping is much larger and changes more frequently, such as when new objects are created or workloads shift. This mapping is sharded and distributed across multiple physical machines to ensure scalability and availability.
4.4.2. Light Replication
When nodes in a shard experience high load due to a hot object, LambdaStore can temperarily create light replicas of the object on other worker nodes. These light replicas do not participate in the replication protocol. Instead, during updates, the primary node pushes only the updated version number to the light replicas. The light replica lazily fetches the most recent data as needed during job execution. Like secondary nodes, When a workflow completes, the light replica sends the read set and write set associated with the workflow to the primary for finalization to ensure serializability across all participating nodes.
Light replication is especially effective for compute- or read-heavy workloads. Since reads can be processed in parallel, adding light replicas increases computational capacity without raising conflict rates. However, for write-heavy workloads, light replicas offer little benefit, as they can lead to a high number of conflicting transactions running concurrently, ultimately wasting compute resources. For this reason, LambdaStore currently only creates light replicas for objects that are not write-intensive.
4.4.3. Object Migration
Compare to light replication, object migration is more heavyweight as it involves transferring ownership of an object to a different shard. During migration, certain operations such as writes cannot be processed, leading to temporary unavailability. Despite this cost, object migration is essential for effective load balancing, as it allows hot objects to be distributed across multiple shards.
Object migration is managed by the coordinating service. It begins by identifying which microshard needs to be migrated. The coordinator then instructs the primary node currently storing the microshard to initiate the migration. During this process, the object is read-locked, and its data is transferred to the target node. While migration is in progress, read requests can still be served for the object. Once the new node has received all the data, the coordinator updates the microshard’s location. At that point, the original node deletes its local copy and begins rejecting any further requests involving the migrated object.
4.5. Fault Tolerance
LambdaStore can tolerate crash failures of individual nodes. We define a
failure as a node crashing, losing power, or becoming disconnected from the network. It is important to note that while the platform tolerates crash failures in the system architecture, it also supports arbitrary (or Byzantine [
44]) failures during function execution through its virtualization mechanism, as discussed in
Section 4.2.
Transaction execution in LambdaStore tolerates failures of any node, including the client or frontend that initiated the transaction. Since clients and frontends do not participate in state storage or workflow execution after submitting a request, their failure is non-critical. They can simply re-issue the request upon restart without impacting the correctness or progress of the transaction.
We generally refer to a replica set failure when any of its nodes fails. Each replica set in LambdaStore has nodes, allowing the system to tolerate up to f node failures without data loss. When a node fails, the affected replica set undergoes reconfiguration by restarting or replacing the failed node. The coordinator then notifies all other replica sets of the updated configuration.
4.5.1. Replica Failures
Failures of secondary replicas can be handled by the primary, as it is always responsible for processing transaction prepare and commit phases first. Once the primary is informed of a reconfiguration, it will re-issue any delegated transactions or jobs that were affected by the failure.
When the primary fails, a new primary is selected from the remaining secondary replicas by the coordinator. This choice allows the new primary to take over immediately, without requiring a complex recovery protocol. A new replica is then added to the set and synchronizes its state from the existing nodes. If necessary, the new primary reissues pending commit requests. While the state of in-progress transactions may be lost, atomicity is not violated, as incomplete transactions will not be committed and can be safely retried.
Since light replicas do not participate in the replication protocol, they act merely as a cache for the microshards and do not require explicit recovery. The coordinator can create additional light replicas as needed, and any jobs that were running on a failed light replica can be retried on other nodes within the shard.
4.5.2. Transaction Manager Failures
For multi-shard transactions, the transaction manager must consolidate state across all participating nodes. Some nodes may have already prepared the transaction and are waiting to finalize it to release their locks properly, while others may have already finalized the transaction. To uphold atomicity, it is essential to ensure that the transaction is either committed or aborted consistently across all nodes.
After recovery, all nodes notify the new transaction manager about any prepared transactions that originated from it. If the new manager has previously logged a commit for a transaction, it instructs the involved node(s) to proceed with the commit. If no commit was logged, the transaction either was aborted or had not yet completed its prepare phase, in which case it can be safely aborted without violating correctness.
4.5.3. Transaction Participant Failures
When a replica set fails, information about in-progress transactions, including those involving remote shards, may be lost. For transactions that are still in the execution phase, their jobs will simply be re-issued. Similarly, for transactions that were in the prepare phase at the time of failure, the prepare requests will be re-issued as well.
Transactions that have already been prepared or partially finalized must be finalized consistently across all participants. To achieve this, the new primary first checks for any in-progress transactions and then queries each transaction’s manager to determine whether it should be committed or aborted. In the case of a commit, the new primary will always have access to the transaction’s write set, as it would have been replicated during the prepare phase. At this point, it is guaranteed that the prepare phase succeeded at all participants as otherwise, the transaction would not have advanced to the commit phase.
4.5.4. Coordinator Failures
In our current prototype, coordinators are not replicated and therefore cannot recover from failures. A production-ready version of LambdaStore would use a distributed metadata management service, such as ZooKeeper, to store object and node mappings. In that case, we could rely on the fault tolerance of the underlying metadata service to support recovery. We leave this integration to future work.
6. Discussion and Future Directions
In this section, we discuss several limitations in the current LambdaStore implementation and how we plan to address them in future work.
6.1. Scheduling Policies
LambdaStore’s design and implementation open up many scheduling opportunities—including where new objects should be placed, when to migrate an object or create new light replicas, on which replica serverless jobs should run, etc. Its compute–storage co-design enables the scheduler to make more informed decisions by leveraging additional information such as each function’s read-write pattern, the size and location of states it can access, etc. A carefully designed scheduling policy has the potential to further enhance LambdaStore’s performance. We leave the exploration of scheduling policies to future work.
6.2. Access Control and Data Sharing
Currently, clients can invoke any public function of any application managed by the platform. A future version of LambdaStore should explore session management and access control to isolate applications from one another and protect user data from unauthorized clients. Existing storage systems often include built-in access control mechanisms, which could be integrated with session information from the compute layer to enable fine-grained, application-specific access control.
6.3. Long-Running Workflows
So far, we described a system built for interactive cloud applications. Some design decisions, such as serializable workflow and optimistic concurrency control, can starve long-running workflows due to repeated conflicts. LambdaStore can mitigate this issue in two ways. First, many database systems use priority scheduling and lock escalation to prevent long-running transactions from being starved. LambdaStore can adopt similar techniques to ensure progress for long-running workflows. Second, long-running workflows often do not require strict serializability. LambdaStore can be extended to support configurable consistency guarantees for applications or even individual workflows. We leave the design of such extensions to future work.
6.4. Resource Isolation
While WebAssembly provides fault isolation by injecting safeguards at compile time, it does not offer resource isolation at runtime. Existing wisdom suggests using container techniques such as Linux cgroups to offer strong resource isolation [
11,
27].
LambdaStore can follow this approach. Alternatively,
LambdaStore can implement resource isolation at the application level by using a custom scheduler that enforces resource limits on each job. More specifically,
LambdaStore currently maps serverless jobs to the worker’s process space and uses Tokio for user-level scheduling. We can augment the Tokio scheduler to ensure that no serverless job consumes more resources than it has been allocated. We leave the design and implementation of such resource-aware scheduling to future work.
6.5. Support for Legacy Applications
A significant limitation of our current design is the lack of support for legacy applications that rely on a POSIX-like API. However, the WebAssembly community is actively developing a standardized system call interface similar to POSIX [
51]. While the object-based model used in
LambdaStore does not directly map to such an interface,
LambdaStore could expose a local file system abstraction for each object by mapping individual files to entries in the object’s storage. This abstraction would allow a legacy application to run within the system’s
LambdaObjects model. It is important to note that this approach would still require minor modifications to legacy applications and may not support all system calls.