Stateful Order-Preserving Encryption for Secure Cloud Databases

Jho, Nam-Su; Youn, Taek-Young

doi:10.3390/electronics15071412

Open AccessArticle

Stateful Order-Preserving Encryption for Secure Cloud Databases

by

Nam-Su Jho

and

Taek-Young Youn

^*

Department of Cybersecurity, College of Software Convergence, Dankook University, Yongin-Si, Gyeonggi-do 16890, Republic of Korea

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(7), 1412; https://doi.org/10.3390/electronics15071412

Submission received: 31 January 2026 / Revised: 18 March 2026 / Accepted: 27 March 2026 / Published: 28 March 2026

(This article belongs to the Special Issue Securing Tomorrow: Human-Centric Security and Privacy in the Fifth Industrial Revolution)

Download

Browse Figures

Versions Notes

Abstract

We propose stateful order-preserving encryption (SOPE), a novel framework designed to realize human-centric data security and privacy, the fundamental values of the Fifth Industrial Revolution. Conventional order-preserving encryption supports efficient queries in cloud databases but fundamentally leaks plaintext distributions, leaving data vulnerable to inference attacks. To mitigate this vulnerability while maintaining query efficiency, SOPE introduces a partition-based dynamic density adjustment mechanism under an honest-but-curious threat model. This mechanism offsets density imbalances between partitions in real time by inserting decoy ciphertexts, thereby limiting the leakage scope to the order of data while obscuring frequency information. Our analysis and empirical evaluations demonstrate that SOPE’s ciphertexts consistently approach a uniform distribution by adaptively compensating for the underlying plaintext distribution through decoy insertion. While the continuous insertion of decoy ciphertexts inevitably incurs additional storage overhead (controlled by a tunable parameter

λ

), our evaluations demonstrate practical performance. By striking an optimal balance between efficiency and human privacy rights, SOPE provides a trustworthy infrastructure for secure data utilization.

Keywords:

order-preserving encryption; data privacy; data encryption; encrypted DB; range query

1. Introduction

Modern industry is transitioning beyond technical efficiency and automation into the era of the Fifth Industrial Revolution (5IR), which seeks to achieve the harmonious coexistence of humanity and technology. Data serves as the core engine of this paradigm shift. In particular, as the utilization of sensitive information to provide sophisticated, personalized services increases exponentially, ensuring the synergy between data utility and privacy protection has become more critical than ever. Recently, privacy breaches occurring when individual sensitive data is treated as a tool for industrial advancement have fostered distrust toward technological progress [1]. Therefore, to ensure the sustainable development of the data industry, guaranteeing Human-Centric Security and Privacy through technical means has emerged as a fundamental challenge.

With the rise of smart IoT environments and digital healthcare, highly sensitive health information that was previously difficult to collect is being accumulated in massive quantities, fueling the rapid growth of the personalized service market. While these changes enhance the quality of human life, they simultaneously create a security issue: the risk of data leakage increases during the distribution and utilization of personal privacy data [2]. In short, technology designed for good can unintentionally threaten human rights.

While security at the IoT edge device level cannot be overlooked, the integration of data collected from numerous sensors into a cloud environment creates a massive repository that fully encapsulates an individual’s life. Consequently, data breaches at the backend storage layer result in far more extensive and devastating privacy violations compared to threats stemming from single-device vulnerabilities. From this perspective, our ‘human-centric’ approach acknowledges the importance of both domains; however, our primary research focus lies in protecting this aggregated data within the cloud against threats at the persistent storage level [2].

Although legal and institutional frameworks are being established to prevent privacy violations, these measures primarily focus on post-incident responses, making it difficult to stop security problems before they happen. Despite the application of server-side security such as Access Control, many recent privacy data breach cases have been caused by privileged users with administrative rights. Thus, it is essential to establish a technical foundation that maintains data in an encrypted state throughout its entire lifecycle. In this scenario, even if a server administrator or an external attacker obtains the data, they cannot comprehend the original information without the decryption key. By mathematically ensuring this level of data confidentiality, cryptographic approaches directly enforce the core security objectives outlined in established risk assessment methods, such as the NIST framework [3], thereby reducing human-centric privacy risks.

However, conventional encryption techniques are designed to completely conceal the statistical characteristics of plaintexts, thereby maximizing security. Consequently, ciphertexts appear as random strings to anyone without the decryption key, making it impossible to perform operations such as searching, sorting, or classification in the encrypted state. While some methods temporarily decrypt data during processing to solve this issue, they create a vulnerable point where plaintexts are exposed in memory, leaving the data still at risk from insider attacks.

Order-Preserving Encryption (OPE) has been proposed to address these challenges by preserving the order of plaintexts during encryption. That is, for any plaintexts

m_{1}

and

m_{2}

, if

m_{1} < m_{2}

, the relationship

Enc (m_{1}) < Enc (m_{2})

holds for their corresponding ciphertexts. Applying OPE allows for efficient operations like sorting and range queries directly on the encrypted database without decryption. However, OPE possesses an inherent vulnerability: the ciphertext itself exposes the order information of the plaintext. This provides attackers with data distribution information, enabling statistical inference attacks, which result in lower security compared to standard cryptographic algorithms.

To address these vulnerabilities, several encryption schemes have been developed to reduce distribution leakage. However, most existing approaches either fail to provide a formal security guarantee like IND-OCPA in a dynamic environment or incur significant performance degradation as the database size grows. Specifically, while some schemes attempt to flatten the distribution, they often lack an adaptive mechanism that responds to real-time data updates, creating a gap between theoretical security and practical cloud database requirements. This is the research gap SOPE aims to fill by providing an adaptive, stateful framework that ensures uniform ciphertext distribution through dynamic density adjustment.

While modern encrypted database literature highlights various leakage channels—such as access patterns and communication volume—as significant security concerns [4,5,6,7], research into mitigating these risks remains a highly active and critical field. In particular, OPE has been under intense security review because the exposed order of ciphertexts directly enables attackers to perform sophisticated attacks that reconstruct the original plaintext values. Recent studies have shown that even passive adversaries can exploit access patterns and result sizes to fully reconstruct plaintext values, often with near-optimal efficiency [6,7]. Furthermore, it has been demonstrated that such leakages can be combined with auxiliary public distributions to negate the privacy guarantees of encryption schemes in real-world datasets [7].

As part of the ongoing efforts to address these vulnerabilities, researchers have explored various countermeasures, including techniques to adjust the amount of leakage or mask patterns through padding [8]. Within this context, we propose a Stateful Order-Preserving Encryption (SOPE) scheme. It is important to note that our objective is not to claim that SOPE provides absolute immunity against all forms of leakage-based analysis. Rather, we acknowledge the inherent security limitations of OPE and propose SOPE as an enhanced framework designed to maximize security within the achievable bounds of the OPE paradigm. Our approach focuses on neutralizing distribution-based inference attacks by transforming non-uniform plaintext distributions into uniform ciphertext distributions. By incorporating dynamic state information, SOPE aims to minimize the statistical signatures that adversaries exploit for reconstruction, thereby providing a significantly hardened data environment without sacrificing the query efficiency inherent to OPE.

The remainder of this paper is organized as follows: Section 2 describes the technological evolution of OPE and provides a detailed analysis of the security limitations of existing schemes, including the specific mechanisms of recent leakage-abuse and reconstruction attacks; Section 3 presents the design and architecture of the proposed SOPE; Section 4 verifies its security and efficiency; and finally, Section 5 concludes the paper.

2. Preliminaries

2.1. Notation

The main notations and mathematical symbols used throughout this paper are summarized as follows:

$M$ : The plaintext space
$C$ : The ciphertext space
$K$ : The key space
$ℓ_{p}$ : Bit-length of plaintexts
$ℓ_{c}$ : Bit-length of ciphertexts
$s k$ : The secret key
$OP (x)$ : Order pattern of a plaintext vector $x$
$A$ : A probabilistic polynomial-time (PPT) adversary
$U$ : A user (or challenger) executing the protocol
f: A strictly increasing function from $M$ to $C$ representing a deterministic OPE
$μ$ : The number of partitions for the ciphertext space
$λ$ : The decoy insertion ratio
$Π = (OPE . Enc, OPE . Dec)$ : The underlying non-deterministic order-preserving encryption primitive

2.2. Previous Works on OPE

As modern cyber threats continually evolve in complexity, establishing robust predictive models and proactive defense mechanisms—ranging from hybrid deep learning techniques [9] to advanced cryptographic protocols—has become increasingly necessary to secure highly sensitive data environments. In this context, Order-Preserving Encryption (OPE) is a cryptographic technique that preserves the numerical order of plaintexts during the encryption process. While a total order within the plaintext domain is necessary to define these relationships clearly, OPE can be applied to general data types because all digital data is processed as bit-strings, allowing for lexicographical comparisons. For clarity, we assume that all plaintext data are numerical in this paper. Specifically, for any two plaintexts

m_{1}

and

m_{2}

, the condition

m_{1} < m_{2}

implies

Enc (m_{1}) < Enc (m_{2})

. Since the indexing and query optimization structures of relational database management systems (RDBMSs) are built upon such order relationships, a system utilizing OPE can efficiently perform sorting and range queries without needing decryption on the server side. In contrast, standard block ciphers like AES completely hide the statistical properties of plaintexts through confusion and diffusion, making it impossible to preserve order.

Early OPE research focused on statistical heuristic approaches rather than the design of rigorous cryptographic primitives. For example, one approach considered encrypting a plaintext m as the cumulative sum of positive random numbers

p_{i}

, such that

Enc (m) = \sum_{i = 1}^{m} p_{i}

[10]. However, this design has a fatal flaw: if each

p_{i}

follows a specific distribution (e.g., a uniform distribution), the approximate value of the plaintext can be calculated by dividing the ciphertext by the statistical expected value. These designs, which lacked a formal cryptographic foundation, were later shown to have security defects severe enough to allow plaintext recovery.

The first systematic study of OPE was introduced by Agrawal et al. [11] in 2004. They proposed a distribution-based mapping technique designed to ensure that the ciphertext set follows a uniform distribution, regardless of the original plaintext distribution. This model is significant as it was the first attempt to provide a formal proof of security for OPE. However, Agrawal’s model is limited to static environments where the set of plaintexts to be encrypted must be known in advance. If new data is added randomly or if the distribution characteristics of the plaintexts change, the statistical information of the plaintexts risks being exposed.

In 2009, Boldyreva et al. [12] investigated the theoretical security of OPE, defining security models such as IND-OCPA and POPF-CCA, and proposed a practical scheme that achieves POPF-CCA. Nevertheless, subsequent research [13] proved that OPE has an inherent security limit, showing that it inevitably leaks significant information about the most significant bits of the plaintext unless the ciphertext space is exponentially larger than the plaintext space. To overcome these theoretical limits, Popa et al. [14] analyzed that the vulnerabilities of OPE arise from its deterministic nature and proposed mOPE (Mutable Order-Preserving Encryption). mOPE constructs a balanced search tree as data is inserted, assigning dynamic encodings based on the node’s path information. This dynamic encoding structure allows the same plaintext to be converted into different ciphertexts depending on the tree’s state, achieving high security. However, this comes with a trade-off: significant performance overhead due to frequent communication between the client and server to generate encodings, as well as the additional cost of re-encrypting data during relocation.

Meanwhile, to move beyond the limitations of ciphertexts directly exposing order, Boneh et al. [15] defined the concept of Order-Revealing Encryption (ORE) in 2015. Instead of direct comparison (

c_{1} < c_{2}

), ORE uses a comparison function

C o m p (c_{1}, c_{2})

to derive order information. While early ORE schemes based on multilinear maps were impractical, Chenette et al. [16] and Lewi-Wu [17] significantly improved efficiency by proposing practical ORE schemes using only Pseudorandom Functions (PRF) and hash functions. However, these ORE schemes intrinsically reveal partial order relations during comparison; as these results accumulate over many queries, the entire order structure of the stored data can eventually be reconstructed.

As OPE/ORE technologies matured, attack techniques also became more sophisticated. Naveed et al. [18] demonstrated through experiments on real-world medical databases that many plaintexts could be recovered by combining frequency analysis with correlations between OPE-encrypted columns. Subsequently, Durak et al. [19] and Grubbs et al. [20] introduced advanced inference attacks that go beyond simple snapshot attacks, emphasizing the need for systemic security measures that consider data distribution and query patterns alongside the security of the algorithm itself.

Modern encrypted database literature identifies diverse and fundamental leakage channels, including access patterns (the sequence of memory or disk locations accessed) [4], volume leakage (the number of records returned per query), and adaptive query leakage (information revealed through sequential, correlated queries) [5]. While these channels pose significant threats in general searchable encryption, they are often considered orthogonal to the core security goals of OPE, which is fundamentally designed to support range queries through order exposure.

Building upon these foundational leakages, more sophisticated reconstruction attacks have emerged. For example, Kellaris et al. [6] demonstrated that an adversary observing range query access patterns can reconstruct underlying plaintext distributions with high accuracy, even without prior knowledge. This threat was further intensified by Lacharité et al. [7], who proved that passive adversaries can exploit “result size” leakage to uniquely identify plaintexts with near-optimal efficiency. While countermeasures like SEAL [8] attempt to mitigate these attacks via adjustable leakage and padding to mask query patterns, they often introduce significant storage and communication overhead. This necessitates a more rigorous investigation into the research gaps surrounding current OPE schemes, specifically the need for a mathematically verifiable security framework that provides robust protection against distribution-based inference without compromising efficiency.

Analytical Comparison of OPE Frameworks: Rather than designing new cryptographic primitives from scratch to mitigate inherent leakages, a highly practical direction is to adopt a framework-level approach. In this paradigm, an existing OPE algorithm is treated as a black-box underlying primitive, and a generic security-enhancing layer is applied on top of it without modifying its core cryptographic operations. A notable example of such a framework is Modular OPE (Mod-OPE) [13], which attempts to hide the inherent exposure of minimum and maximum boundaries (e.g., 0 and the maximum plaintext

m a x = 2^{ℓ_{p}} - 1

) by applying a secret modular shift before encryption. Specifically, for a secret offset x, the plaintext m is transformed into

m^{'} = (m + x) \mod 2^{ℓ_{p}}

, and then encrypted as

c = OPE (m^{'})

.

While this approach seemingly obscures the absolute location of data within the domain, it is analytically vulnerable to access pattern analysis. When a user issues a range query

[a, b]

where

0 \leq a < b \leq m a x

, the modular shift causes the query to wrap around the boundary, forcing the client to split the query into two disjoint ranges if it crosses the shifted maximum. By observing which range queries are split and analyzing the intersection of returned ciphertexts, an adversary can easily deduce the secret offset x [21]. Once x is exposed, the shifted order is trivially reverted to the absolute order, rendering the scheme as vulnerable as the underlying plain OPE.

This demonstrates that merely shifting or masking the order is insufficient against access pattern analysis. Consequently, a more advanced framework must address the distribution itself. While the concrete architecture and formal definitions of our proposed SOPE are detailed in Section 3, Table 1 summarizes its theoretical positioning relative to existing frameworks. Unlike Mod-OPE, SOPE dynamically transforms the plaintext frequencies into a uniform ciphertext distribution, effectively neutralizing statistical inference and access pattern matching without relying on vulnerable static shifts.

2.3. Security Notions

The standard security model for traditional symmetric-key encryption, IND-CPA (Indistinguishability under Chosen-Plaintext Attack), requires that an adversary be unable to distinguish a specific ciphertext from a random string, even when they can obtain ciphertexts for any chosen plaintexts. This model requires that not even a single bit of information about the plaintext should be leaked from its ciphertext. However, OPE inherently struggles to satisfy such strong security requirements due to its fundamental property of preserving the numerical order of plaintexts. If an adversary performs queries against an encryption oracle in a manner similar to a binary search over the entire plaintext space

M

, they can pinpoint the exact plaintext for any given ciphertext in only

O (\log | M |)

queries.

Consequently, OPE cannot theoretically satisfy IND-CPA. To address this, Boldyreva et al. [12] proposed a relaxed security definition called IND-OCPA, specifically designed for the characteristics of OPE. In the IND-OCPA model, an adversary submits two plaintext sequences,

(x_{0}, x_{1})

. If the order patterns within both sequences are identical, the adversary must be unable to distinguish which sequence the oracle has encrypted. This implies that an ideal OPE scheme should disclose no information other than the order of the plaintexts. To demonstrate this, a scheme must be proven statistically indistinguishable from a Random Order-Preserving Function (ROPF).

2.3.1. Formal Definition of IND-OCPA

To formally define IND-OCPA security for OPE, we first establish the concept of an order pattern for a plaintext vector. For a plaintext vector

x = (x_{1}, x_{2}, \dots, x_{q})

, the order pattern

OP (x)

is defined as the set of relative order relationships between its elements:

OP (x) = {(i, j) ∣ x_{i} < x_{j}, 1 \leq i, j \leq q} .

(1)

Thus, two plaintext vectors

x

and

y

share the same order pattern (

OP (x) = OP (y)

) if and only if

x_{i} < x_{j} \Leftrightarrow y_{i} < y_{j}

holds for all

i, j

.

In our threat model, we assume a standard honest-but-curious (semi-honest) cloud server as the adversary. The server strictly follows the prescribed system protocols without performing active attacks—such as modifying stored data, altering user queries, or injecting its own malicious queries. Instead, it passively monitors the execution process with the intent to infer the user’s private information from observed data. Crucially, we assume the adversary possesses auxiliary distribution knowledge of the plaintext domain. By correlating observed ciphertext frequencies and access patterns with these external statistics, the adversary attempts to perform reconstruction or identification attacks. Our goal is to ensure that such an informed adversary cannot reliably match encrypted datasets to their underlying plaintext distributions. The IND-OCPA security of an OPE scheme is formally defined via the following experiment

{Exp}_{OPE, A}^{ind - ocpa - b} (k)

involving a PPT adversary

A

and a security parameter k:

Setup: The challenger $U$ runs $Setup (1^{k})$ to generate a secret key $s k$ .
Challenge: The adversary $A$ chooses two sequences of plaintexts of equal length, $x_{0} = (x_{0, 1}, \dots, x_{0, q})$ and $x_{1} = (x_{1, 1}, \dots, x_{1, q})$ , and submits them to $U$ .
1.
Consistency Check: $U$ verifies that both sequences have the identical order pattern, i.e., $OP (x_{0}) = OP (x_{1})$ . If the condition is not met, $U$ returns ⊥.
2.
Encryption: Otherwise, $U$ encrypts the sequence corresponding to the experiment bit b. It computes the ciphertext sequence $c = (c_{1}, \dots, c_{q})$ , where $c_{i} \leftarrow {OPE}_{s k} (x_{b, i})$ , and returns $c$ to $A$ .
Guess: $A$ outputs a guess bit $b^{'}$ in ${0, 1}$ . The output of the experiment ${Exp}_{OPE, A}^{ind - ocpa - b} (k)$ is defined as this bit $b^{'}$ .

The advantage of the adversary

A

against the OPE scheme is defined as:

{Adv}_{OPE, A}^{ind - ocpa} (k) = |\Pr [{Exp}_{OPE, A}^{ind - ocpa - 1} (k) = 1] - \Pr [{Exp}_{OPE, A}^{ind - ocpa - 0} (k) = 1]| .

(2)

An OPE scheme is said to satisfy IND-OCPA security if for any PPT adversary

A

, the advantage

{Adv}_{OPE, A}^{ind - ocpa} (k)

is a negligible function of the security parameter k.

2.3.2. Theoretical and Practical Gaps

Even when indistinguishability from an ideal ROPF is achieved, fundamental limitations in OPE security remain. According to the study by Boldyreva et al. [13], although an ROPF is selected uniformly at random from the set of all order-preserving functions between the plaintext space

M

and ciphertext space

C

, this function family exhibits high statistical uniformity. Consequently, plaintexts located in the center of

M

are mapped near the center of

C

with a non-negligible probability. In particular, plaintexts at the boundaries of the domain are observed to be placed near the edges of the ciphertext space with extremely high probability. This implies that even with the OPE scheme satisfying IND-OCPA, the location information of the data can still be leaked.

The practical implications of these theoretical vulnerabilities were demonstrated by Naveed et al. [18] through their inference attack via Cumulative Distribution Function (CDF) matching. By exploiting order leakage, an inherent weakness of OPE, this attack showed that an adversary can reconstruct plaintexts without the secret key by correlating the distribution of a ciphertext set with the statistical characteristics of auxiliary data.

The practical implications of these theoretical vulnerabilities were further substantiated by recent studies on leakage-abuse attacks. Kellaris et al. [6] and Lacharité et al. [7] demonstrated that even a passive adversary can reconstruct plaintexts by accumulating access patterns and query result sizes.

Access Pattern and Order Reconstruction: An adversary observing a sequence of range queries

Q = {q_{1}, q_{2}, \dots, q_{n}}

can monitor the set of accessed ciphertext identifiers

I D (q_{i})

. By analyzing the intersection and union of these sets, the adversary can reconstruct the relative order of encrypted records. For instance, in a passive server model, the overlap between query results allows the server to partition the ciphertext space into disjoint intervals, eventually re-establishing the absolute rank of each ciphertext. Note that, in OPE, this order information is inherently exposed even without complex reconstruction from access patterns, making it a baseline for further exploitation.

Statistical Identification with Auxiliary Information: The threat becomes more acute when access patterns are combined with auxiliary knowledge of the plaintext domain

M

(e.g., public medical or salary statistics). As explored by Lacharité et al. [7], an adversary can correlate the observed ciphertext frequency

D_{C}

with a known plaintext distribution

D_{P}

. The adversary seeks a mapping

f : C \to M

that minimizes the statistical distance:

\min_{f} Δ (D_{C}, D_{P})

(3)

When the plaintext and ciphertext sets are assumed to have a near 1:1 mapping, this correlation allows the adversary to pinpoint the exact plaintext value of a ciphertext with near-optimal efficiency.

Identification via Minimum Result Sizes: In a dense plaintext domain, the adversary can identify queries targeting a single plaintext element (point queries) by observing the minimum non-zero result size,

| I D (q_{i}) | = 1

. By isolating these single-element access patterns and mapping them to the relative order

R (c_{j})

of the ciphertext

c_{j}

obtained from structural reconstruction, the adversary can uniquely pinpoint the plaintext value

m_{j} \in M

. Formally, if the adversary identifies a ciphertext

c_{j}

that consistently appears as the sole result of a point query q, and they know the rank

R (c_{j})

from previous access pattern analysis, the mapping becomes:

f (c_{j}) = {Rank}_{M} (R (c_{j}))

(4)

This allows the adversary to deduce the exact plaintext value without any statistical matching, provided they can distinguish point queries from range queries. Such attacks are particularly effective against traditional OPE because its static, one-to-one mapping ensures that a specific plaintext always triggers the same access pattern and result size.

Strategic Objective of SOPE: Acknowledging these vulnerabilities, it is widely accepted that OPE inherently provides limited security even without sophisticated reconstruction attacks. For instance, if an adversary knows that a ciphertext set represents patient ages, they can easily infer that the top 10% of ciphertexts correspond to the oldest 10% of the population by simply observing the order, regardless of any access pattern or query-size analysis. This inherent vulnerability implies that the most critical security factor in OPE is to hide the correlation between a given ciphertext set and its underlying plaintext distribution.

While certain countermeasures like SEAL [8] attempt to mask query-specific leakage by adding dummy ranges, such methods incur significant overhead and do not address the fundamental distribution-level exposure. Instead, the primary objective of SOPE is to neutralize these statistical matching tools at the source by ensuring a uniform ciphertext distribution across all possible datasets. By dynamically transforming any non-uniform plaintext frequencies into a statistically uniform ciphertext space, SOPE ensures that for any two distinct plaintext sets

M

and

M^{'}

, their encrypted representations remain indistinguishable:

P (Enc (M)) \approx P (Enc (M^{'})) \approx Uniform

(5)

For example, consider an adversary observing a database containing 100 encrypted records. Since all stored ciphertext sets—regardless of whether they represent patient ages, salaries, or other sensitive metrics—exhibit an identical uniform distribution and the same length, the adversary cannot determine which specific plaintext dataset the ciphertexts originate from. Furthermore, even if the adversary applies the reconstruction attacks described in [6,7] to re-establish an order, the absence of any statistical skewness means they cannot verify whether their reconstructed values are accurate or confidently match them to any auxiliary data. This paradigm shift—from a fixed, static mapping to a stateful, distribution-aware framework—is the core objective of SOPE, aiming to overcome the inherent security limits of traditional OPE by making the outputs of reconstruction attacks fundamentally unreliable.

2.4. Research Challenges

This paper proposes a solution that diverges from traditional OPE design methods. We analyze the root causes of existing problems as follows. In conventional OPE schemes, once a secret key is chosen, a static mapping from the plaintext space

M

to the ciphertext space

C

is uniquely determined. Consequently, this map assigns a single, fixed ciphertext in

C

to any given plaintext in

M

. Since traditional OPE relies strictly on one-to-one mapping, it is inherently difficult to hide the distinctive characteristics of certain intervals or boundary values (maximum and minimum). Without prior knowledge of which plaintexts will be encrypted, the scheme must be designed to defend against every possible case. However, this creates a dilemma: strengthening the concealment of one interval often leaves another vulnerable. Fundamentally, using a fixed mapping without knowing the future data distribution poses a fundamental security challenge.

To overcome these limitations, this paper incorporates state information that tracks the current state of the encryption process. This approach is somewhat similar to order-preserving encoding using tree structures, which provides diversity by allowing the same plaintext set to produce different outputs depending on the insertion order, even with the same key. However, such encoding schemes require state information that includes details of previously encrypted data. Since this state is typically stored on the server, a user must retrieve prior information to encrypt new plaintexts. This requires excessive interaction between the user and the server.

In this paper, we utilize state information without such interactions. Instead of storing specific details about each plaintext, we construct a compact state consisting only of the distributional information of the encrypted data set. This allows users to manage the state information locally and perform efficient OPE without server interaction. Furthermore, we address a major flaw in traditional OPE: the deterministic one-to-one mapping between plaintexts and ciphertexts, which makes it difficult to hide the plaintext distribution. Unlike existing OPE schemes that use a fixed one-to-one mapping, our approach introduces a new method to overcome current security limits, enabling order preservation while effectively masking underlying data patterns.

3. Proposed Scheme

In this section, we propose the detailed design of Stateful-OPE (SOPE), which enhances security by utilizing the distribution of ciphertexts generated during the encryption process as state information. The proposed SOPE operates as a general security-enhancing framework that wraps around any conventional OPE scheme, adopting it as a cryptographic primitive. Since this design is independent of the underlying cryptographic primitive, it operates as a general framework that preserves the search efficiency of existing OPE systems. Simultaneously, by dynamically controlling the encryption process based on client-side state information, SOPE significantly improves resistance against statistical inference attacks.

3.1. Problem Statement

To address the limitations of conventional OPE, this work aims to design a stateful OPE (SOPE) scheme that satisfies the following two primary goals:

Security Goal: The scheme must achieve IND-OCPA security by ensuring that the ciphertext distribution converges to a uniform distribution, thereby neutralizing distribution-based inference attacks even when the relative order of ciphertexts is exposed.
Efficiency Goal: The scheme must maintain the core advantage of OPE—efficient range query performance—while providing a tunable parameter $λ$ to manage the trade-off between security (decoy insertion) and storage overhead.

3.2. Design Intuition

To address the exposure of plaintext distribution in conventional OPE, SOPE provides a technique to hide the overall data distribution by dynamically inserting decoy ciphertexts into the ciphertext space. This part explains the core intuition of the proposed scheme.

While existing OPE algorithms (

OPE . Enc, OPE . Dec

) provide one-wayness, making the direct recovery of a plaintext m difficult, an adversary can still perform inference attacks by extracting statistical features from the ciphertext set

{c_{1}, \dots, c_{n}}

. SOPE mixes a large volume of decoys with actual ciphertexts, forcing the observed distribution of the entire ciphertext set to be determined by the decoy distribution rather than the actual data distribution. By inserting a sufficiently larger number of decoys (

λ \times n

) compared to actual ciphertexts (n), SOPE completely reconstructs the data density upon which statistical analysis relies.

Simply adding decoys uniformly or using a fixed distribution creates a risk: an adversary might exploit the statistical properties of the decoys to filter them out and extract the original distribution.

To prevent this, SOPE employs a stateful dynamic adjustment mechanism. The ciphertext space

C

is partitioned into

μ

sub-intervals, and the client maintains a count of the ciphertexts (both actual and decoy) assigned to each interval as state information. Specifically, this state information is defined as an array of

μ

integers, where each integer records the cumulative count of ciphertexts located within its corresponding sub-interval. When a new plaintext is encrypted and placed in a specific sub-interval, the count for that interval increases. Subsequently, SOPE adds

λ

decoys by selecting sub-intervals with the minimum count of existing ciphertexts. Through this process, the overall density of the ciphertext space converges toward a target distribution (in this case, a uniform distribution), regardless of the original plaintext distribution.

Note that SOPE can be flexibly configured to converge toward a specific distribution other than a uniform one by pre-defining the partition ratios and density differentials. However, as the underlying mechanism remains the same, this paper focuses on the uniform distribution for simplicity.

Furthermore, to defend against scenarios where an adversary manipulates the input order of plaintexts to induce ephemeral distribution imbalances, SOPE randomizes the processing order of plaintexts through a permutation process before encryption. This prevents potential information leakage that could occur if ciphertexts are temporarily concentrated in specific intervals during the dynamic encryption process. Consequently, the final ciphertext set reaches the target distribution with high probability, ensuring that an adversary observing the entire set cannot distinguish the statistical characteristics of the original plaintexts.

Finally, if an authorized user cannot distinguish between actual ciphertexts and decoys, any operations performed on the data would yield meaningless results. Therefore, authorized users possessing the secret key must be able to easily identify decoys. SOPE is designed to append a 1-bit tag, dependent on the secret key, to each ciphertext. This ensures that only a legitimate user can efficiently distinguish actual ciphertexts from decoys.

3.3. Definition of Underlying OPE Primitive

In this section, we formally define the underlying Order-Preserving Encryption (OPE) primitive that serves as the foundation for our proposed SOPE framework. To enhance security against frequency analysis, we transition from a conventional deterministic function to a non-deterministic construction. This approach is rooted in the early OPE literature.

Notations and Definitions: Let

f : M \to C

be a strictly increasing function representing a deterministic OPE. Without loss of generality, we define the plaintext space

M

as a set of

ℓ_{p}

-bit integers,

M = {0, 1, \dots, 2^{ℓ_{p}} - 1}

, and the ciphertext space

C

as a set of

ℓ_{c}

-bit integers,

C = {0, 1, \dots, 2^{ℓ_{c}} - 1}

. By the fundamental property of OPE, for any

m_{1}, m_{2} \in M

, the condition

m_{1} < m_{2}

implies

f (m_{1}) < f (m_{2})

.

Construction of Non-deterministic OPE: A well-known limitation of deterministic OPE is that identical plaintexts map to the same ciphertext, leaving the scheme vulnerable to frequency analysis. To mitigate this, Agrawal et al. [11] suggested a method to resolve duplicates by mapping a plaintext into an interval rather than a single point. Specifically, if a plaintext m maps to

c_{0}

and

m + 1

maps to

c_{1}

, the encryption of m is randomly sampled from the interval

[c_{0}, c_{1})

, ensuring that the ciphertexts of m are uniformly spread. Following this intuition, we define a non-deterministic encryption function

OPE . Enc : K \times M \to C

derived from the deterministic OPE function f. Given a secret key

s k \in K

and a plaintext

m \in M

, the encryption process is defined as follows: For General Cases (

0 < m < 2^{ℓ_{p}} - 1

): We compute the boundary values using the base function

f_{s k} (m)

and

f_{s k} (m + 1)

. The ciphertext

OPE . Enc (s k, m)

is then defined as a random integer c sampled from the range:

OPE . Enc (s k, m) = c \overset{$}{\leftarrow} [f_{s k} (m), f_{s k} (m + 1)) \subset C

(6)

To ensure the entire ciphertext space is utilized, the boundary values are handled as follows: For the minimum value

m = 0

:

OPE . Enc (s k, 0) \overset{$}{\leftarrow} [0, f_{s k} (1))

(7)

For the maximum value

m = 2^{ℓ_{p}} - 1

:

OPE . Enc (2^{ℓ_{p}} - 1) \overset{$}{\leftarrow} [f_{s k} (2^{ℓ_{p}} - 1), 2^{ℓ_{c}})

(8)

The computational cost of this encryption process consists of exactly two evaluations of the function f and a single uniform sampling of an integer from a defined interval.

The

OPE . Enc

procedure is summarized in Algorithm 1 and illustrated in Figure 1.

Algorithm 1 OPE.Enc

1:: Input: A plaintext m, secret key $s k$
2:: Output: A Ciphertext c

3:: if $m = 0$ then
4:: $m i n \leftarrow 0, m a x \leftarrow f_{s k} (m + 1) - 1$
5:: else if $m = 2^{ℓ_{p}} - 1$ then
6:: $m i n \leftarrow f_{s k} (m), m a x \leftarrow 2^{ℓ_{c}} - 1$
7:: else
8:: $m i n \leftarrow f_{s k} (m)$ , $m a x \leftarrow f_{s k} (m + 1) - 1$
9:: end if
10:: Choose a random bit-string $c \in [m i n, m a x]$
11:: return c

Decryption Process of Non-deterministic OPE: The decryption algorithm

OPE . Dec

(s k, c)

does not require an explicit inverse function of the base OPE. Instead, it identifies the original plaintext m by leveraging the monotonic property of the underlying function f. Specifically, given a ciphertext c, the decryption process first checks the boundary conditions to handle the edge cases of the plaintext space

M

:

If $c < f_{s k} (1)$ , then $OPE . Dec (s k, c) = 0$ .
If $c \geq f_{s k} (2^{ℓ_{p}} - 1)$ , then $OPE . Dec (s k, c) = 2^{ℓ_{p}} - 1$ .

For all other cases, the decryption is performed through a binary search over the range

[0, 2^{ℓ_{p}} - 1]

. The search space is initialized as

[L, R] = [0, 2^{ℓ_{p}} - 1]

, and in each iteration, a midpoint

m i d

is selected to compute

f_{s k} (m i d)

and

f_{s k} (m i d + 1)

. The process continues until it finds m such that

c \in [f_{s k} (m), f_{s k} (m + 1))

. Since the size of the plaintext space

M

is

2^{ℓ_{p}}

, this binary search approach requires at most

O (ℓ_{p})

evaluations of the function f. Every integer in the ciphertext space

C

is guaranteed to be mapped to a unique plaintext through this procedure.

The

OPE . Dec

procedure is summarized in Algorithm 2 and illustrated in Figure 2.

Algorithm 2 OPE.Dec

1:: Input: A ciphertext c, secret key $s k$
2:: Output: A plaintext m

3:: if $c < f_{s k} (1)$ then
4:: return 0
5:: else if $c \geq f_{s k} (2^{ℓ_{p}} - 1)$ then
6:: return $2^{ℓ_{p}} - 1$
7:: else
8:: Find m such that $f_{s k} (m) \leq c < f_{s k} (m + 1)$ ▹ Using binary search on $M$
9:: return m
10:: end if

A key advantage of this non-deterministic construction is the universal validity of the ciphertext space. Since

OPE . Enc (s k, m)

samples from the continuous interval

[f_{s k} (m), f_{s k} (m + 1))

, every integer

c \in C

is a decodable point that maps to a unique plaintext m.

Structural Equivalence: Any arbitrary bit-string within the range $[f_{s k} (m), f_{s k} (m + 1))$ is a legitimate ciphertext. This makes the distinction between a specific output of f and a randomly sampled value within the interval practically irrelevant, as they are structurally identical.
Security Implication: This property effectively masks the deterministic backbone f. An attacker cannot distinguish whether a ciphertext originates from a fixed mapping or is a randomly selected value from a valid range, thereby mitigating frequency-based cryptanalysis.

In conclusion, we consider the non-deterministic OPE defined in this section as the underlying OPE for constructing our proposed SOPE framework. Hereafter, this primitive is formally denoted as:

Π = (OPE . Enc, OPE . Dec)

(9)

All subsequent descriptions of our system’s operations and security analysis will be based on this probabilistic

Π

.

3.4. Detailed Description

This part describes the specific encryption and search procedures of the proposed SOPE scheme. The scheme adopts any non-deterministic order-preserving encryption algorithm as its underlying primitive, denoted as

Π = (OPE . Enc, OPE . Dec)

. To control the security level and state precision, two integer security parameters are pre-configured:

λ

(the decoy insertion ratio) and

μ

(the number of partitions).

Definition and Initialization of State Information: First, the ciphertext space

C

is partitioned into

μ

equal-sized sub-spaces,

{C_{1}, C_{2}, \dots, C_{μ}}

. To track the total number of ciphertexts assigned to each sub-space, the user maintains a set of counters

{n_{1}, n_{2}, \dots, n_{μ}}

. This set is defined as the State Information of the proposed scheme. All counters are initially set to zero.

Encryption: Given a plaintext m to be encrypted, the user first generates an actual ciphertext

c = OPE . Enc (s k, m)

using the internal OPE function. The sub-space

C_{j}

containing c is identified, and the corresponding counter

n_{j}

is incremented by 1. To ensure the server cannot distinguish between actual data and decoys, a 1-bit identification tag, derived from a one-way hash function dependent on the secret key, is appended to c, forming the final ciphertext

c^{'}

:

c^{'} = c ‖ M S B (h (c ‖ s k)) .

(10)

While denoted as a concatenation (‖) conceptually, in our database implementation, this operation is mathematically realized by shifting the integer c and adding the tag bit b (where

b = M S B (h (c ‖ s k))

). That is, the stored ciphertext is

c^{'} = (c ≪ 1) \lor b

(equivalently,

c^{'} = 2 c + b

).

Note that this encoding method ensures that the tag bit never interferes with the original order of ciphertexts. Formally, for any two ciphertexts

c_{1}, c_{2} \in C

where

c_{1} < c_{2}

, their encoded versions

c_{1}^{'} = 2 c_{1} + b_{1}

and

c_{2}^{'} = 2 c_{2} + b_{2}

satisfy

c_{1}^{'} < c_{2}^{'}

regardless of the tag values

b_{1}, b_{2} \in {0, 1}

. This can be verified by the following difference:

c_{2}^{'} - c_{1}^{'} = (2 c_{2} + b_{2}) - (2 c_{1} + b_{1}) = 2 (c_{2} - c_{1}) + (b_{2} - b_{1})

(11)

Since

c_{1} < c_{2}

and both are integers,

c_{2} - c_{1} \geq 1

, which implies

2 (c_{2} - c_{1}) \geq 2

. Given that the maximum possible value of

(b_{1} - b_{2})

is 1 (when

b_{1} = 1, b_{2} = 0

), the minimum value of the difference is:

c_{2}^{'} - c_{1}^{'} \geq 2 + (0 - 1) = 1 > 0

(12)

Therefore,

c_{1}^{'} < c_{2}^{'}

always holds. This mathematical guarantee allows the server to perform range queries using standard integer comparison on ciphertexts without any specialized comparison logic.

Immediately after generating the actual ciphertext, the user generates decoy ciphertexts according to the predefined

λ

. The user selects sub-intervals with the smallest counter values and extracts a random bit string x belonging to those intervals. Since standard OPE maps a small plaintext space into a significantly larger ciphertext space, any random integer x sampled from this highly sparse domain can be considered a valid OPE ciphertext. We provide a more detailed discussion of this property in the subsequent Remark on Non-deterministic Encryption. A decoy

x^{'}

is formed by appending an inverted identification tag:

x^{'} = x ‖ (M S B (h (x ‖ s k)) \oplus 1) .

(13)

The corresponding counters are also updated. Finally, the user transmits the complete set, consisting of all actual ciphertexts and their corresponding

λ

decoys, to the server only after the encryption process for the entire plaintext set is finalized. Through this process, the density of the entire ciphertext space converges to a uniform distribution, regardless of the original plaintext distribution.

Search and Decryption: Range queries

[a, b]

on the encrypted data exploit the inherent properties of OPE. To perform a search, the user first computes the OPE boundaries for the given range:

c_{a} = f_{s k} (a)

and

c_{b} = f_{s k} (b + 1) - 1

. To account for the 1-bit identification tags in the encoded index, the user constructs the precise query boundaries

[L, U]

as follows:

L = 2 c_{a}, U = 2 c_{b} + 1

(14)

This construction ensures that for any plaintext

m \in [a, b]

, its corresponding encoded ciphertext

c^{'} = 2 \cdot OPE . Enc (s k, m) + t

is guaranteed to fall within

[L, U]

, regardless of the tag bit

t \in {0, 1}

. The server extracts and returns all elements within the range

[L, U]

from the stored ciphertext set. Upon receiving the results, the user verifies the 1-bit tag for each ciphertext by recomputing it with their secret key. Only ciphertexts with matching tags are considered actual data and decrypted, while mismatched decoys are discarded. This filtering process ensures that authorized users obtain accurate plaintext results without interference from decoy data. The encryption, range query, and decryption procedures are summarized in Algorithms 3, 4 and 5, respectively. Additionally, the procedure for range query is illustrated in Figure 3.

Algorithm 3 SOPE.Enc

1:: Input: Plaintext set ${m_{1}, \dots, m_{n}}$ , secret key $s k$ , partition count $μ$ , decoy ratio $λ$ , sub-spaces ${C_{1}, \dots, C_{μ}}$ , state information ${n_{1}, \dots, n_{μ}}$
2:: Output: Ciphertext set S

3:: if First execution of encryption then
4:: $n_{j} \leftarrow 0$ for all $j \in {1, \dots, μ}$ ▹ Initialize state information
5:: end if
6:: $S \leftarrow \emptyset$
7:: $π \leftarrow RandomPermutation ({1, \dots, n})$ ▹ Optional: Order Randomization
8:: for $i = 1$ to n do
9:: $i d x \leftarrow π (i)$
10:: $c_{i d x} \leftarrow OPE . Enc (s k, m_{i d x})$
11:: Find j such that $c_{i d x} \in C_{j}$
12:: $n_{j} \leftarrow n_{j} + 1$
13:: $t_{i d x} \leftarrow M S B (h (c_{i d x} ‖ s k))$ ▹ 1-bit Tag for real data
14:: $S \leftarrow S \cup {c_{i d x} ‖ t_{i d x}}$
15:: for $ℓ = 1$ to $λ$ do
16:: Find $j^{*}$ such that $n_{j^{*}} = \min {n_{1}, \dots, n_{μ}}$ ▹ Find the sparsest partition
17:: Choose a random bit-string $R_{ℓ} \in C_{j^{*}}$
18:: $n_{j^{*}} \leftarrow n_{j^{*}} + 1$
19:: $t_{ℓ}^{'} \leftarrow M S B (h (R_{ℓ} ‖ s k)) \oplus 1$ ▹ Inverted tag for decoy
20:: $S \leftarrow S \cup {R_{ℓ} ‖ t_{ℓ}^{'}}$
21:: end for
22:: end for
23:: return S

Algorithm 4 SOPE Range Query Generation

1:: Input: A range $[a, b] \subset M$ , secret key $s k$
2:: Output: Range query $[L, U]$

3:: $L \leftarrow f_{s k} (a) ‖ 0$
4:: $U \leftarrow (f_{s k} (b + 1) - 1) ‖ 1$
5:: return $[L, U]$

Figure 4 illustrates the overall architecture and operational workflow of the proposed SOPE scheme. To preserve strict data privacy and minimize communication overhead, the core encryption processes are heavily localized. The client independently handles the state information (

μ

-variables) and executes the SOPE engine. Consequently, the untrusted server functions merely as a blind storage and query processor, operating without any knowledge of the client’s internal state or the underlying data distribution.

Algorithm 5 SOPE.Dec

1:: Input: A ciphertext $C \in {0, 1}^{ℓ_{c}}$ , secret key $s k$
2:: Output: Plaintext m or ⊥ (if C is a decoy)
3:: Parse C as $c ‖ t$ , where t is the last 1 bit of C
4:: $t^{'} \leftarrow M S B (h (c ‖ s k))$ ▹ Recompute the expected tag
5:: if $t^{'} = t$ then ▹ Check if the ciphertext is real
6:: $m \leftarrow OPE . Dec (s k, c)$ ▹ Internal OPE decryption
7:: return m
8:: else ▹ The ciphertext is a decoy
9:: return ⊥
10:: end if

3.5. Illustrative Example

To clearly demonstrate the practical mechanics and the security advantages of our proposed SOPE algorithm, we present a simplified illustrative example. Suppose the plaintext space is

[1, 100]

and the ciphertext space is

[1, 1000]

. We partition both spaces into 10 intervals, setting the distribution management parameter to

μ = 10

.

Let us consider two highly skewed plaintext vectors to observe the distribution leakage:

x_{0} = {10, 20, 30}

, which is heavily concentrated in the lower intervals, and

x_{1} = {70, 80, 90}

, which is concentrated in the higher intervals.

Suppose the underlying internal OPE function encrypts these plaintext vectors into

S_{0}^{(O P E)} = {150, 240, 410}

for

x_{0}

and

S_{1}^{(O P E)} = {650, 710, 820}

for

x_{1}

. If we solely rely on this internal OPE primitive, the resulting ciphertexts inevitably reflect the original data distribution. Consequently, an adversary could easily distinguish between the two sets and infer the approximate plaintext ranges simply by observing the ciphertext distributions.

To mitigate this vulnerability, we apply our SOPE scheme with

λ = 3

(i.e., inserting three decoy ciphertexts for every one real ciphertext). For the sake of clarity in this example, we denote the ciphertexts corresponding to actual plaintexts with numbers ending in ‘0’, and the decoy ciphertexts with numbers ending in ‘1’. This notational choice intuitively reflects the actual mechanism of our SOPE scheme, where the least significant bit (LSB) is utilized to distinguish between real and decoy ciphertexts.

Let us consider the step-by-step application of this process. For the first plaintext vector,

x_{0} = {10, 20, 30}

:

When encrypting the first plaintext 10, the internal function outputs 150. To mask this, the SOPE algorithm identifies the three intervals with the lowest frequencies and injects three decoys (e.g., $441, 751, 921$ ). The intermediate ciphertext set becomes ${150, 441, 751, 921}$ .
Next, 20 is encrypted to 240. The algorithm again selects the three least populated intervals at this stage and adds decoys $81, 371$ , and 871. The updated set grows to ${81, 150, 240, 371, 441, 751, 871, 921}$ .
Finally, 30 is encrypted to 410, and three new decoys $581, 621$ , and 781 are added to the most vacant intervals. The finalized mapping for $S_{0}^{(S O P E)}$ becomes ${81, 150, 240, 371,$ $410, 441, 581, 621, 751, 781, 871, 921}$ .

Similarly, for the second plaintext vector,

x_{1} = {70, 80, 90}

:

The encryption of 70 yields 650, and three decoys ( $151, 471, 811$ ) are injected, resulting in ${151, 471, 650, 811}$ .
Encrypting 80 yields 710, with new decoys $61, 391$ , and 951. The set expands to ${61, 151, 391, 471, 650, 710, 811, 951}$ .
Finally, encrypting 90 yields 820, with decoys $271, 311$ , and 551. The finalized mapping for $S_{1}^{(S O P E)}$ is ${61, 151, 271, 311, 391, 471, 551, 650, 710, 811, 820, 951}$ .

Through this progressive demonstration, it is evident how SOPE strategically populates the least frequent intervals. As a result, the finalized mappings for the two highly skewed plaintext vectors,

x_{0}

and

x_{1}

, can be summarized as follows:

\begin{matrix} x_{0} = {10, 20, 30} & \overset{SOPE}{\to} S_{0}^{(S O P E)} = {81, 150, 240, 371, 410, 441, 581, 621, 751, 781, 871, 921} \\ x_{1} = {70, 80, 90} & \overset{SOPE}{\to} S_{1}^{(S O P E)} = {61, 151, 271, 311, 391, 471, 551, 650, 710, 811, 820, 951} \end{matrix}

The resulting ciphertext sets

S_{0}^{(S O P E)}

and

S_{1}^{(S O P E)}

span the entire ciphertext space, thoroughly concealing the highly skewed distribution of the original plaintexts and effectively demonstrating the security advantage of the SOPE algorithm.

4. Analysis

4.1. Security

In this section, we provide a formal security analysis of the proposed SOPE scheme. We first establish a rigorous definition for the underlying OPE’s ciphertext space and prove that individual SOPE ciphertexts are indistinguishable from random bit strings. Building upon this, we demonstrate the distribution-hiding capability of the entire ciphertext set.

Definition 1 (Full-Domain OPE).

An OPE scheme is said to be full-domain if its ciphertext space

C

encompasses the entire range of bit strings of a fixed length

ℓ_{c}

(i.e.,

C = {0, 1}^{ℓ_{c}}

). In this case, any value

s \in C

is a valid output of the encryption function for some secret key

s k

and plaintext m. That is, for an adversary

A

who does not possess

s k

, any sampled value

s \in C

is computationally indistinguishable from a valid ciphertext.

Theorem 1 (Indistinguishability of SOPE Ciphertexts).

If the underlying OPE primitive is full-domain and the hash function h is modeled as a Random Oracle, then a single SOPE ciphertext

c_{S O P E} = s ‖ t

is computationally indistinguishable from a random bit string of the same length to any PPT adversary

A

who does not possess the secret key

s k

.

Proof of Theorem 1.

Suppose an adversary

A

attempts to distinguish between a SOPE ciphertext

c_{S O P E} = s ‖ t

and a random bit string

r = s^{'} ‖ t^{'}

of equivalent length. First, we analyze the structural component s. Since the underlying OPE is full-domain, every value in the ciphertext space

C

is a valid potential ciphertext. Critically, s is merely a numerical value within

C

, and its existence does not inherently reveal its origin—whether it was generated from a specific plaintext m or sampled uniformly from

C

. Therefore, the value s alone provides no distinguishing information to

A

.

Second, to verify if

c_{S O P E}

is an actual ciphertext,

A

must determine the validity of the attached identification tag t. In the SOPE scheme, the tag is defined as

t = MSB (h (s ‖ s k))

.

To an adversary $A$ without $s k$ : Since h is a Random Oracle, the output $h (s ‖ s k)$ is uniformly distributed and computationally unpredictable without the knowledge of $s k$ . Thus, t appears as a random coin flip ( $\Pr [t = 1] = 1 / 2$ ), making it indistinguishable from a random bit $t^{'}$ .
Separation of Actual and Decoy: Even if $A$ suspects s is a valid OPE ciphertext, $A$ cannot confirm whether $c_{S O P E}$ is an actual data point (t) or a decoy ( $t \oplus 1$ ) without computing the hash, which requires $s k$ .

Since both the structural component s and the tag t are indistinguishable from their random counterparts to any PPT adversary

A

lacking

s k

, the concatenated string

s ‖ t

is computationally indistinguishable from a random bit string. □

Theorem 2.

Regardless of the statistical distribution of the input plaintext set, the proposed SOPE scheme can generate a ciphertext set that converges to a target distribution (e.g., uniform distribution) based on the pre-configured parameters μ and λ.

Proof of Theorem 2.

Assume a user selects

μ

partitions

(C_{1}, \dots, C_{μ})

and a decoy insertion ratio

λ

to encrypt a total of n plaintexts. This proof demonstrates the uniformity of the resulting ciphertext set by showing that the maximum gap between the number of ciphertexts

n_{j}

in each partition is maintained below a certain threshold.

Initially, all partition counters

n_{i}

are set to zero. For each encryption, the counter

n_{j}

for the partition containing the actual ciphertext is incremented by 1. Immediately thereafter,

λ

decoy ciphertexts are allocated to the partitions with the minimum counts.

During the encryption of the first plaintext, the condition

\max_{i, j} | n_{i} - n_{j} | \leq 1

is naturally satisfied. Now, consider the addition of a new plaintext when the current state satisfies this condition. If the actual ciphertext is assigned to a minimum-frequency partition, the gap remains 1 or reduces to 0. Conversely, if it is assigned to a maximum-frequency partition, its

n_{j}

increases, potentially creating a temporary gap of

\max_{i, j} | n_{i} - n_{j} | = 2

. However, since the subsequent

λ

decoys are assigned to the minimum-frequency partitions first, this gap typically decreases and does not expand further. Thus, even in the worst-case scenario during two encryption steps, the frequency difference across partitions does not exceed 2. During the encryption of a single plaintext, the frequency gap increases by at most 1. In the average case, where actual ciphertexts are distributed across different partitions in each step, the difference between the maximum and minimum frequencies remains negligible.

In a more extreme case, suppose actual ciphertexts are consecutively assigned to a specific partition

C_{j}

for

n^{'}

steps. While the count for

C_{j}

increases in each step, the remaining

μ - 1

partitions share

λ \times n^{'}

decoys to offset the density difference. The upper bound of the maximum frequency difference between partitions is determined as follows:

\max_{i, j} | n_{i} - n_{j} | < n^{'} + 1 - ⌊\frac{n^{'} λ}{μ - 1}⌋ \approx n^{'} \times (1 - ⌊\frac{λ}{μ - 1}⌋)

(15)

As shown in the equation, a smaller

n^{'}

(consecutive assignments) leads to a closer convergence to a uniform distribution.

To ensure this, SOPE applies a random permutation to the plaintext set prior to encryption to shuffle the input sequence. In a practical environment where n and

μ

are sufficiently large, the probability of actual ciphertexts being consecutively assigned to the same interval remains extremely low.

For instance, with

n =

10,000 and

μ = 100

, the probability of an actual ciphertext being assigned to the same partition for three consecutive steps is less than

10^{- 4}

, occurring approximately only once during the entire process. The probability of four or more consecutive assignments drops to

10^{- 6}

(approx. 0.01 occurrences). Even if such a rare event occurs, the resulting frequency difference would be less than 4, which is negligible compared to the average density per partition (100). Therefore, it is statistically difficult to generate a meaningful bias.

To formalize this convergence, we evaluate the statistical distance

Δ (P_{SOPE}, P_{U})

between the empirical probability distribution

P_{SOPE}

of the generated ciphertext set across the

μ

partitions and the ideal uniform distribution

P_{U}

. Let

N = n (1 + λ)

be the total number of ciphertexts (including decoys). The empirical probability of a ciphertext falling into partition

C_{i}

is

P_{SOPE} (C_{i}) = \frac{n_{i}}{N}

, whereas the ideal uniform probability is

P_{U} (C_{i}) = \frac{1}{μ}

. The average number of ciphertexts per partition is

\bar{n} = \frac{N}{μ}

.

The statistical distance is defined as:

Δ (P_{SOPE}, P_{U}) = \frac{1}{2} \sum_{i = 1}^{μ} |P_{SOPE} (C_{i}) - P_{U} (C_{i})| = \frac{1}{2} \sum_{i = 1}^{μ} |\frac{n_{i}}{N} - \frac{1}{μ}| = \frac{1}{2 N} \sum_{i = 1}^{μ} |n_{i} - \bar{n}|

(16)

Note that the maximum deviation of any partition count

n_{i}

from the mean

\bar{n}

is strictly bounded by

\max_{i, j} | n_{i} - n_{j} |

(i.e.,

| n_{i} - \bar{n} | \leq \max_{i, j} | n_{i} - n_{j} |

), and

\max_{i, j} | n_{i} - n_{j} |

is bounded by

n^{'} + 1

, we can bound the statistical distance:

Δ (P_{SOPE}, P_{U}) \leq \frac{1}{2 N} \sum_{i = 1}^{μ} n_{m a x}^{'} + 1 = \frac{μ \cdot (n_{m a x}^{'} + 1)}{2 \cdot n (1 + λ)} = \frac{n_{m a x}^{'}}{n} \cdot \frac{μ}{2 (1 + λ)}

(17)

In this equation, the term

\frac{n_{m a x}^{'}}{n}

represents the ratio of the maximum consecutive assignments to a single partition relative to the total number of plaintexts n. Because SOPE strictly applies a random permutation to the plaintext set prior to encryption, the probability of an actual ciphertext being consecutively mapped to the same partition drops exponentially. Therefore, as n and

μ

becomes sufficiently large, the ratio

\frac{n_{m a x}^{'}}{n}

decreases exponentially toward zero.

Consequently, the statistical distance

Δ (P_{SOPE}, P_{U})

is bounded by a value that becomes exponentially small. This mathematically guarantees that the probability of any adversary distinguishing the

P_{SOPE}

distribution from the perfectly uniform distribution

P_{U}

is exponentially negligible, effectively nullifying any frequency-based or density-based inference attacks. □

Before proving the IND-OCPA security of SOPE, we recall the formal IND-OCPA security game defined in Section 2.3.1. In the context of our proposed stateful scheme, the user

U

in the Setup phase also initializes the system parameters

μ

(partitions) and

λ

(decoy ratio). During the Challenge phase, when

A

submits

x_{0}

and

x_{1}

sharing the same order pattern (

OP (x_{0}) = OP (x_{1})

),

U

applies the stateful density adjustment mechanism. The rigorous formal modeling of this stateful interaction is crucial because

A

observes not only the target ciphertexts but also the dynamically inserted decoys.

Theorem 3.

Assuming the underlying cryptographic hash function is modeled as a random oracle and the system parameters are sufficiently large to ensure a negligible statistical distance to a uniform distribution, the proposed SOPE scheme satisfies IND-OCPA security against any polynomial-time adversary

A

.

Proof of Theorem 3.

Consider the IND-OCPA security game defined by the experiment

{Exp}_{SOPE . Enc, A}^{ind - ocpa - b} (k)

between an adversary

A

and a challenger

U

.

U

sets up the secret key

s k \in {0, 1}^{k}

and system parameters

μ

,

λ

.

A

selects two plaintext vectors

x_{0}

and

x_{1}

of the same length and with the same order pattern (

OP

), and submits them as a challenge query.

U

encrypts the sequence

x_{b}

corresponding to the experiment bit b to generate a ciphertext set

S_{b}

(which includes both valid ciphertexts and decoys), and returns it to

A

.

Let

{Adv}_{SOPE, A}^{IND - OCPA} (k)

be the advantage of

A

in correctly guessing the bit b. To demonstrate that this advantage is negligible, we assume for contradiction that there exists a polynomial-time adversary

A

with a non-negligible advantage. We will show that such an

A

can be used to construct a reduction algorithm

B

that distinguishes the decoy ciphertexts (specifically, the 1-bit identification tags) from perfectly random bits with a non-negligible advantage.

To achieve the non-negligible advantage,

A

must rely on two possible strategies: (1) distinguishing the global distribution of the entire ciphertext set

S_{0}

from

S_{1}

, or (2) identifying individual decoy ciphertexts within

S_{b}

to filter them out and reveal the underlying plaintext patterns.

First, regarding the global distribution, the generated set

S_{b}

structurally converges to an ideal uniform distribution

S_{U}

. As established in Theorem 2, with suitable parameters, the statistical distance between the distribution of

S_{b}

and

S_{U}

is bounded by a small value, denoted as

Δ_{stat} (S_{b}, S_{U})

. By the triangle inequality, the distance between

S_{0}

and

S_{1}

is bounded by

2 \cdot Δ_{stat} (S_{b}, S_{U})

. Note that by selecting sufficiently large system parameters (e.g., the number of plaintexts n and partitions

μ

), this statistical distance

Δ_{stat} (S_{b}, S_{U})

can be made arbitrarily small, effectively converging to a negligible function. Consequently, without breaking the decoys,

S_{0}

and

S_{1}

remain globally indistinguishable to

A

.

Second, because the global distributions are statistically indistinguishable,

A

’s non-negligible advantage must stem from the second strategy. The decoy filtering process relies entirely on the 1-bit identification tag attached to each ciphertext. Our reduction algorithm

B

uses

A

’s filtering ability to distinguish the output of a cryptographic hash function (keyed by

s k

) from perfectly random bits.

Combining these bounds, the overall advantage of

A

is strictly bounded by the advantage of

B

and the statistical distance:

{Adv}_{SOPE, A}^{IND - OCPA} (k) \leq 2 \cdot Δ_{stat} (S_{b}, S_{U}) + {Adv}_{B}^{tag} (k)

(18)

This equation implies that

B

must have a non-negligible advantage. However, modeling the hash function as a random oracle, the advantage of any polynomial-time adversary

B

without

s k

is computationally bounded by

O (2^{- k})

, which is strictly negligible.

This yields a contradiction to our initial assumption that

A

has a non-negligible advantage. Thus, the proposed SOPE scheme satisfies IND-OCPA security. □

While Theorem 3 provides a formal reduction for asymptotic IND-OCPA security, achieving a perfectly uniform distribution theoretically assumes a sufficiently large number of plaintexts n and suitable parameters. Because real-world applications operate with finite parameters, a theoretical gap naturally exists. However, SOPE is still positioned as a practical heuristic mitigation. As will be demonstrated in the subsequent experimental evaluation (Section 4.4), this heuristic effectively conceals the underlying statistical information, ensuring practical security even under finite parameter constraints.

4.2. Performance and Efficiency

In this section, we analyze the efficiency of the proposed SOPE scheme in terms of computational overhead, storage requirements, and implementation results. Since SOPE adopts a conventional OPE as its internal primitive, its baseline performance inherits the efficiency of the selected OPE algorithm.

4.2.1. Computation Overhead

The encryption time for a single plaintext in SOPE is determined by the sum of one OPE operation for the actual data,

λ

random samplings within the ciphertext space for decoys, and the generation of identification tags for

(λ + 1)

outputs. The primary performance bottleneck of this process is the running time of the underlying OPE primitive. Because decoys are generated through random bit-string extraction rather than executing the actual OPE algorithm, the increase in computational load remains marginal, even as

λ

increases. Therefore, the overall encryption overhead remains within a practical and manageable range.

{Time}_{SOPE . Enc} \approx {Time}_{OPE . Enc} + λ \times {Time}_{Sample} + (λ + 1) \times {Time}_{Tag_Gen}

(19)

In the decryption process, decoys are first filtered out by verifying the 1-bit identification tags, which requires only a single hash operation per ciphertext. Once the decoys are removed, the actual data is decrypted using the same procedure as the underlying OPE primitive. The additional overhead is limited to

λ

hash evaluations; however, the main bottleneck remains the OPE decryption itself. Consequently, the overall processing time is nearly identical to that of conventional OPE, ensuring that the decryption overhead remains manageable and practical.

{Time}_{SOPE . Dec} \approx {Time}_{OPE . Dec} + O (λ) \times {Time}_{Tag_Verify}

(20)

4.2.2. Storage Overhead

Since SOPE inserts

λ

decoys for every actual ciphertext, the server-side storage requirement is approximately

(λ + 1)

times larger than that of existing schemes.

{Storage}_{SOPE} = n \times (λ + 1) \times ({Ciphertext_Size}_{OPE} + 1 bit)

(21)

However, from a practical viewpoint, SOPE offers a significant advantage. Traditional OPE schemes often attempt to counter inference attacks by excessively increasing the bit length of ciphertexts to reduce density. In contrast, SOPE achieves robust security without significantly increasing ciphertext size, as it effectively obscures the plaintext distribution using decoys.

Meanwhile, the size of the State Information managed by the user

U

is proportional to the number of partitions

μ

, which is approximately

μ \times \log (n (λ + 1) / μ)

bits. For example, even with

μ = 1000

, the state can be managed within a few dozen kilobytes of memory. Thus, the storage burden on the client side is negligible.

4.2.3. Transmission Overhead

First, regarding the query overhead (client-to-server), SOPE does not introduce any additional transmission cost compared to the baseline OPE. To perform a standard range query for an interval

[a, b]

, the client simply evaluates the encryption function at the boundaries, generating the queried ciphertexts

f_{s k} (a) | | 0

and

f_{s k} (b) | | 1

. Thus, the query computed and transmitted by the user is exactly identical to that of a conventional OPE scheme. This relationship can be expressed as:

{Query_Size}_{SOPE} = {Ciphertext_Size}_{SOPE} \times 2 .

(22)

Second, regarding the result overhead, the size of the data transmitted from the server back to the client inherently requires more bandwidth due to the transmission of decoy ciphertexts. The exact size of the returned result is variable, as it depends on the specific range queried and the data density within the targeted partitions. However, in the average case, the server returns

λ

decoy ciphertexts for every genuine ciphertext. Therefore, the expected size of the query result scales linearly with the density parameter

λ

:

{Result_Size}_{SOPE} = {Result_Size}_{OPE} \times (λ + 1) .

(23)

While this

(λ + 1)

factor represents an increase in transmission overhead, it is a strictly deterministic, linear scaling. We consider this predictable overhead a highly practical trade-off, essential for fundamentally preventing distribution leakage and ensuring IND-OCPA security.

4.3. Implementation

To verify the practical effectiveness of the proposed scheme, we implemented SOPE in C within a Linux environment (Ubuntu, Intel Core i7-1355U, 16GB RAM). Assuming deployment in an encrypted database scenario, we integrated the system with MySQL 8.0.39 to measure performance. As the internal OPE primitive, we adopted the scheme proposed by Jho et al. [22], which performs efficient order-preserving operations utilizing only a Pseudo-Random Number Generator (PRNG). This choice offers practical advantages by replacing complex statistical distribution calculations with simple PRNG functions, making it highly compatible with the partition-based stateful density adjustment mechanism of SOPE. Note that we intentionally omit direct performance comparisons with standalone OPE, mOPE, or ORE schemes. Because SOPE functions as a security-enhancing framework layered on top of an underlying OPE, comparing it against standalone baseline OPEs is not directly applicable. Furthermore, because mOPE requires multi-round client-server interactions and ORE loses security with query accumulation, they are not suitable baselines for our single-round, distribution-hiding approach.

Additionally, the random permutation step in our framework is designed to be algorithm-agnostic, providing greater implementation flexibility. In this study, we consider efficient, industry-standard methods such as the Fisher-Yates shuffle [23] or Format-Preserving Encryption (FPE) [24]. Given that these algorithms operate with

O (n)

time complexity, their computational overhead is negligible compared to the underlying OPE process, ensuring that the overall system performance is not significantly impacted.

The experimental dataset consisted of integers ranging from 0 to 9999. We transformed 14-bit plaintext data into 63-bit OPE ciphertexts, then appended a 1-bit identification tag to produce a final 64-bit ciphertext for database storage. For the experiment, we set the parameters to

λ = 4

and

μ = 100

, and measured performance by varying the dataset size from 100 to 1,000,000 entries. The encryption time represents the standalone computational cost of the SOPE algorithm. For the search and decryption performance, we integrated SOPE with a MySQL database to evaluate its practical utility in a standard encrypted database scenario. This metric is intended as a system-level reference to demonstrate the high utility of range queries on SOPE-encrypted tables. All measurements were conducted in a local environment to isolate the performance of the proposed scheme; thus, network-induced delays were not included in the results. The experimental results are summarized in Table 2.

It is important to note that the proposed SOPE framework strictly assumes a single-client architecture, as OPE is intrinsically a symmetric-key primitive. Furthermore, while the inherent stateful nature of SOPE practically supports online operations such as streaming inserts, our implementation and theoretical evaluations primarily assume a batch operation baseline. This approach strictly aligns with standard IND-OCPA security definitions, which evaluate indistinguishability over complete plaintext sets.

Experimental results show that the encryption time for a single plaintext remains consistent at approximately 0.3 ms, regardless of the dataset size. Even when the dataset size increases to

10^{6}

items, the encryption process scales linearly, maintaining high efficiency. A noteworthy observation regarding the decryption phase is that, although while it is non-deterministic, the average processing time per item remains practical, at approximately 3.011 ms. Notably, the overhead for internal operations—specifically decoy sampling (0.0211 ms) during encryption and decoy filtering (0.0031 ms) during decryption—is negligible. These results confirm that SOPE achieves its security objectives while maintaining a high-performance profile suitable for real-time database applications.

4.4. Analysis of Parameter Variation

To evaluate parameter sensitivity, we analyze the impact of varying

λ

on the distribution of ciphertexts. Figure 5 and Figure 6 illustrate the distribution of SOPE ciphertexts for

λ \in {0, 2, 4, 6}

under two different input distributions. Note that

λ = 0

represents the underlying OPE without any decoy insertion. Consistent with the experimental setup in the previous section, the dataset consists of 10,000 plaintexts ranging from 0 to 9999, and the resulting ciphertext space is 64 bits (63 bits for the value and a 1-bit tag).

Figure 7 compares the final ciphertext distributions for the two distinct plaintext datasets when

λ = 4

. The results clearly demonstrate that the proposed scheme effectively obfuscates the underlying plaintext distributions, yielding highly similar and flattened ciphertext distributions regardless of the initial data shape. Regarding the parameter

μ

, empirical observations indicate that it does not have a meaningful impact when the input data size is sufficiently large. Since our proposed scheme targets cloud environments handling large-scale data, a detailed sensitivity analysis for

μ

is omitted in this section.

To further quantify the degree of distribution obfuscation, we calculate the Kullback-Leibler (KL) divergence,

D_{K L} (P ‖ Q)

, where P is the plaintext distribution and Q is the ciphertext distribution. For this analysis, we used the same datasets and experimental settings as those presented in the preceding visual distributions (Figure 5 and Figure 6). As summarized in Table 3, the KL-divergence for both datasets consistently approaches the ideal value of a uniform distribution as

λ

increases. Notably, regardless of whether the underlying OPE (

λ = 0

) starts below or above the ideal bound, SOPE systematically eliminates the distribution correlation, achieving a near-optimal flattened state. This bidirectional convergence to the uniform bound empirically validates that our stateful decoy mechanism provides consistent security effectiveness independent of the initial plaintext density.

To demonstrate the practical security impact, we further analyze the shift in the statistical rank of the ciphertexts. As a representative case, we examine the 10th percentile of the distribution for the dataset in Figure 5. While the original plaintext’s 10th percentile is located at bin 32, the corresponding percentile in the SOPE ciphertext space (

λ \geq 4

) shifts to bin 10. This result aligns perfectly with the expected position in an ideal uniform distribution (where the 10th percentile is bin 10). This systematic shift indicates that SOPE effectively decoys the adversary into incorrect estimations, thereby significantly reducing the accuracy of CDF-matching-based inference attacks.

Resilience Against Inference Attacks

While the KL-divergence metric quantifies the theoretical flattening of the ciphertext distribution, we further conduct explicit plaintext recovery rate measurements to demonstrate the practical security improvements of SOPE against statistical inference. We assume a standard cumulative distribution function (CDF) matching attack [18], where an adversary attempts to reconstruct the plaintexts by mapping the empirical CDF of the observed ciphertexts to the known auxiliary plaintext distribution.

To quantify the adversary’s success, we define the Plaintext Recovery Rate (PRR). Let n be the total number of original plaintext records. For the i-th true plaintext

m_{i}

and its corresponding ciphertext

c_{i}

, let

A (c_{i})

denote the plaintext value guessed by the adversary’s inference algorithm. The PRR is formally computed as:

P R R = \frac{1}{n} \sum_{i = 1}^{n} I (A (c_{i}) = m_{i})

(24)

where

I (\cdot)

is the indicator function, which evaluates to 1 if

A (c_{i}) = m_{i}

(a successful recovery) and 0 otherwise.

To provide a conservative security evaluation, we grant the adversary a significant advantage: rather than requiring the adversary to guess the exact plaintext value, the attack is considered successful if the adversary correctly identifies the specific partition (out of the

μ

partitions) to which the plaintext belongs. We formalize this relaxed metric as the Partition-level Plaintext Recovery Rate (

P R R_{μ}

).

Let the domain be divided into

μ

disjoint intervals, and let

Part (\cdot)

be a function that returns the partition index

j \in {1, \dots, μ}

for a given value. The modified metric

P R R_{μ}

is formally computed as:

P R R_{μ} = \frac{1}{n} \sum_{i = 1}^{n} I (Part (A (c_{i})) = Part (m_{i}))

(25)

where

I (\cdot)

evaluates to 1 if the adversary’s guess falls into the exact same partition as the true plaintext, and 0 otherwise.

In our evaluation, we measured the

P R R_{μ}

using the datasets illustrated in Figure 5 and Figure 6, with the number of partitions set to

μ = 100

. To establish a theoretical baseline for the maximum achievable security, we additionally simulated an ideal, perfectly uniform ciphertext distribution (denoted as ‘Uniform’ in Table 4). Due to the inherent monotonic mapping of order-preserving schemes, the recovery rate cannot strictly reach the random guessing probability (

1 / μ = 1 %

). Instead, the ‘Uniform’ column represents the theoretical lower bound of

P R R_{μ}

for each specific dataset.

The quantitative results are summarized in Table 4. It is important to note that this experiment assumes a worst-case scenario where the adversary possesses exact knowledge of the underlying plaintext distribution. Under this strong assumption, the baseline OPE (

λ = 0

) exhibited a severe vulnerability, yielding a 100% recovery rate. Because traditional OPE relies on a strict one-to-one mapping, the adversary can accurately reconstruct all original plaintexts by relying solely on the preserved order to match the sorted ciphertexts with the known auxiliary distribution.

Conversely, when the proposed SOPE framework was applied, the

P R R_{μ}

plummeted drastically. For the dataset in Figure 5, the recovery rate dropped to

2.89 %

at

λ = 4

, closely approaching the optimal lower bound of

2.54 %

.

Even more remarkably, for the dataset in Figure 6, SOPE at

λ = 4

achieved

4.33 %

, perfectly converging to the ideal lower bound. This lower bound is derived from a theoretical uniform distribution where the total number of ciphertexts—comprising both genuine and decoy data—is distributed equally across all subintervals. By dynamically injecting decoys, SOPE successfully distorts the empirical CDF of the ciphertexts, causing the adversary’s monotonic mapping to fail. This explicit measurement confirms that the distribution flattening in SOPE achieves the optimal defense capacity possible under the constraints of order-preserving encryption.

4.5. Limitations and Discussion

While the SOPE framework significantly enhances security against distribution-based inference attacks, it involves certain trade-offs and operational constraints that merit discussion.

Storage and Bandwidth Overhead:As formulated in our design, SOPE introduces a storage overhead that is approximately $(λ + 1)$ times that of standalone OPE due to the insertion of decoy ciphertexts. This expansion also applies to the communication bandwidth for query results. This is a deliberate trade-off to achieve IND-OCPA security; however, in environments with extreme storage constraints, the sampling parameter $λ$ must be carefully tuned to balance security and efficiency.
State Management and Resilience:The client-side state consists of partition occupancy counters, which are used to optimize communication during encryption. Crucially, these counters are not secret; they represent information that a server can already observe from the ciphertext distribution. This design provides resilience against state loss or rollback risks; the client can synchronize or reconstruct the state by querying the server without compromising security or requiring complex recovery protocols.
Multi-client and Adversary Model: Consistent with the IND-OCPA security model, SOPE primarily assumes a single-client, honest-but-curious adversary. Since OPE is intrinsically a symmetric-key primitive, any environment involving multiple entities sharing the same key is logically treated as a single-user architecture. Additionally, while SOPE is designed to mitigate passive inference attacks, active adversaries issuing adaptive queries represent a higher-level threat that remains outside the current scope.
Deployment Trade-offs: Practical deployment requires balancing the security gain (KL-divergence) against storage costs. For highly skewed datasets, a higher $λ$ is recommended to effectively flatten the distribution, whereas for datasets with naturally high entropy, a smaller $λ$ may be sufficient to achieve acceptable privacy levels while minimizing infrastructure overhead.
Leakage Profile: Finally, to prevent any overinterpretation of the security guarantees, we explicitly clarify our leakage profile. While SOPE mitigates data density correlation via stateful decoys, dynamic operational leakages inherent to encrypted databases remain. Table 5 summarizes this.

5. Conclusions

In this paper, we proposed SOPE, a novel framework designed to effectively mitigate data distribution exposure and subsequent inference attacks, key vulnerabilities faced by conventional Order-Preserving Encryption (OPE) in cloud database environments.

While previous OPE schemes attempted to enhance security by adjusting ciphertext space density or relaxing deterministic properties, they shared a fundamental limitation: the distribution of plaintexts remained traceable within the ciphertext set. To overcome this, the proposed SOPE scheme introduces a partition-based stateful density adjustment algorithm. This mechanism is designed to offset frequency gaps between partitions by inserting decoy ciphertexts during the encryption process.

Through both theoretical analysis and empirical evaluations, we demonstrated that the ciphertext set generated by SOPE effectively converges toward a uniform distribution, provided that the density parameter

λ

is sufficiently large. Based on this empirical convergence, our findings suggest that the scheme can achieve the security characteristics of IND-OCPA security in practical settings, a standard security model for order-preserving primitives.

In conclusion, SOPE optimizes the trade-off between security and storage overhead, offering a practical and effective alternative that can be immediately deployed in commercial database infrastructures requiring high security. Future research will focus on establishing a theoretical framework to determine the optimal parameters for

λ

and

μ

, aiming to maximize security while minimizing storage overhead. Additionally, we plan to explore technical extensions to support complex multi-dimensional queries beyond simple range searches.

Author Contributions

Conceptualization, N.-S.J. and T.-Y.Y.; methodology, N.-S.J. and T.-Y.Y.; software, N.-S.J.; validation, N.-S.J. and T.-Y.Y.; formal analysis, N.-S.J.; investigation, N.-S.J. and T.-Y.Y.; resources, N.-S.J.; data curation, N.-S.J.; writing—original draft preparation, N.-S.J. and T.-Y.Y.; writing—review and editing, N.-S.J. and T.-Y.Y.; visualization, N.-S.J.; supervision, N.-S.J.; project administration, N.-S.J.; funding acquisition, N.-S.J. All authors have read and agreed to the published version of the manuscript.

Funding

The present research was supported by the research fund of Dankook University in 2023.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy restriction.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

IND-CPA	Indistinguishability under Chosen Plaintext Attack
IND-OCPA	Indistinguishability under Ordered Chosen Plaintext Attack
mOPE	Mutable Order-Preserving Encryption
MSB	Most Significant Bit
OPE	Order-Preserving Encryption
ORE	Order-Revealing Encryption
ROPF	Random Order-Preserving Function
SOPE	Stateful Order-Preserving Encryption

References

Cost of Data Breach Report 2025. Available online: https://www.ibm.com/reports/data-breach (accessed on 31 January 2026).
Nithyavani, G.; Naga Raja, G. A Comprehensive Survey on Security and Privacy Challenges in Internet of Medical Things Applications: Deep Learning and Machine Learning Solutions, Obstacles, and Future Directions. IEEE Access 2025, 13, 188955–188989. [Google Scholar] [CrossRef]
NIST SP 800-30 Rev. 1 Guide for Conducting Risk Assessments. Available online: https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-30r1.pdf (accessed on 31 January 2026).
Islam, M.S.; Kuzu, M.; Kantarcioglu, M. Access pattern disclosure on searchable encryption: Ramification, attack and mitigation. In Proceedings of the 19th Annual Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 5–8 February 2012. [Google Scholar]
Cash, D.; Grubbs, P.; Perry, J.; Ristenpart, T. Leakage-abuse attacks against searchable encryption. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS), Denver, CO, USA, 12–16 October 2015; pp. 668–679. [Google Scholar]
Kellaris, G.; Kollios, G.; Nissim, K.; O’Neill, A. Generic attacks on secure outsourced databases. In Proceedings of the 23rd ACM SIGSAC Conference on Computer and Communications Security (CCS), Vienna, Austria, 24–28 October 2016; pp. 1329–1340. [Google Scholar]
Lacharite, M.-S.; Minaud, B.; Paterson, K.G. Improved reconstruction attacks on encrypted data using range query leakage. In Proceedings of the 39th IEEE Symposium on Security and Privacy (S&P), San Francisco, CA, USA, 20–24 May 2018; pp. 297–314. [Google Scholar]
Demertzis, I.; Papadopoulos, D.; Papamanthou, C.; Shintre, S. SEAL: Attack mitigation for encrypted databases via adjustable leakage. In Proceedings of the 29th USENIX Security Symposium, Boston, MA, USA, 12–14 August 2020; pp. 2433–2450. [Google Scholar]
Yadulla, A.R.; Konda, B.; Yenugula, M.; Kasula, V.K.; Tumma, C. Hybrid deep learning technique for cybersecurity detection and classification. Stat. Optim. Inf. Comput. 2025, 14, 2518–2533. [Google Scholar] [CrossRef]
Pavlou, S.S.; Snodgrass, R.T.; Bebek, G. Tamper Detection in Audit Logs. In Proceedings of the 9th International Conference on Extending Database Technology (EDBT), Heraklion, Greece, 14–18 March 2004; pp. 532–550. [Google Scholar]
Agrawal, R.; Kiernan, J.; Srikant, R.; Xu, Y. Order preserving encryption for numeric data. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, Paris, France, 13–18 June 2004; pp. 563–574. [Google Scholar]
Boldyreva, A.; Chenette, N.; Lee, Y.; O’Neill, A. Order-preserving symmetric encryption. In Proceedings of the EUROCRYPT 2009, LNCS 5479, Cologne, Germany, 26–30 April 2009; pp. 224–241. [Google Scholar]
Boldyreva, A.; Chenette, N.; O’Neill, A. Order-preserving encryption revisited: Improved security analysis and alternative solutions. In Proceedings of the CRYPTO 2011, LNCS 6841, Santa Barbara, CA, USA, 14–18 August 2011; pp. 578–595. [Google Scholar]
Popa, R.A.; Li, F.H.; Zeldovich, N. An ideal-security protocol for order-preserving encoding. In Proceedings of the 34th IEEE Symposium on Security and Privacy (S&P), San Francisco, CA, USA, 19–22 May 2013; pp. 463–477. [Google Scholar]
Boneh, D.; Lewi, K.; Raykova, M.; Sahai, A.; Zhandry, M.; Zimmermann, J. Semantically secure order-revealing encryption: Multi-input functional encryption without obfuscation. In Proceedings of the EUROCRYPT 2015, LNCS 9057, Sofia, Bulgaria, 26–30 April 2015; pp. 563–594. [Google Scholar]
Chenette, N.; Lewi, K.; Weis, S.A.; Wu, D.J. Practical order-revealing encryption with limited leakage. In Proceedings of the 23rd International Conference Fast Software Encryption (FSE), Bochum, Germany, 20–23 March 2016; pp. 474–493. [Google Scholar]
Lewi, K.; Wu, D.J. Order-revealing encryption: New constructions, applications and lower bounds. In Proceedings of the 23rd ACM SIGSAC Conference on Computer and Communications Security (CCS), Vienna, Austria, 24–28 October 2016; pp. 1167–1178. [Google Scholar]
Naveed, M.; Kamara, S.; Wright, C.V. Inference attacks on property-preserving encrypted databases. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS), Denver, CO, USA, 12–16 October 2015; pp. 644–655. [Google Scholar]
Durak, F.B.; DuBuisson, T.M.; Cash, D. What else is revealed by order-revealing encryption? In Proceedings of the 23rd ACM SIGSAC Conference on Computer and Communications Security (CCS), Vienna, Austria, 24–28 October 2016; pp. 1155–1166. [Google Scholar]
Grubbs, P.; Sekniqi, K.; Bindschaedler, V.; Naveed, M.; Ristenpart, T. Leakage-abuse attacks against order-revealing encryption. In Proceedings of the 38th IEEE Symposium on Security and Privacy (S&P), San Jose, CA, USA, 22–24 May 2017; pp. 655–672. [Google Scholar]
Mavroforakis, C.; Chenette, N.; O’Neill, A.; Kollios, G.; Canetti, R. Modular order-preserving encryption, revisited. In Proceedings of the 36th ACM SIGMOD International Conference on Management of Data, Melbourne, Australia, 31 May–4 June 2015; pp. 763–777. [Google Scholar]
Jho, N.S.; Chang, K.Y.; Hong, D.W. Efficient construction of order-preserving encryption using pseudo random function. IEICE Trans. Commun. 2015, E98-B, 1276–1283. [Google Scholar] [CrossRef]
Knuth, D.E. Random Sampling and Shuffling. In The Art of Computer Programming, Volume 2: Seminumerical Algorithms, 3rd ed.; Addison-Wesley: Boston, MA, USA, 1997; pp. 142–148. [Google Scholar]
Dworkin, M. Recommendation for Block Cipher Modes of Operation: Methods for Format-Preserving Encryption (NIST Special Publication 800-38G). Available online: https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-38Gr1-draft.pdf (accessed on 31 January 2026).

Figure 1. Encryption of non-deterministic OPE (general case).

Figure 2. Decryption process via binary search over the plaintext space.

Figure 3. Conceptual Workflow of a Range Query in SOPE.

Figure 4. Overall Architecture and Workflow of the SOPE Scheme.

Figure 5. Ciphertext distributions of normal plaintexts under varying

λ

.

Figure 5. Ciphertext distributions of normal plaintexts under varying

λ

.

Figure 6. Ciphertext distributions of skewed plaintexts under varying

λ

.

Figure 6. Ciphertext distributions of skewed plaintexts under varying

λ

.

Figure 7. Comparison of ciphertext distributions for different datasets at

λ = 4

.

Figure 7. Comparison of ciphertext distributions for different datasets at

λ = 4

.

Table 1. Comparison of Security Enhancing Frameworks for OPE.

Framework Level	Leakage Model	Security Goal	Attack Resistance	Overhead
Mod-OPE	Shifted Order (Hides explicit boundaries)	Boundary Concealment	Vulnerable to access pattern analysis (shift exposure via split range queries)	Computation: Low (Modular addition) Storage: Low
Proposed SOPE	Uniform Distribution (Minimized Leakage)	Distribution Indistinguishability	Resists distribution inference and mitigates frequency analysis	Computation: Low (Local state update) Storage/Comm: Medium (Decoy Insertion)

Table 2. Performance evaluation of SOPE.

Phase	Metric	Count / Scale	Execution Time
Encryption	Entire Dataset	100 items	0.030 s
		$10,000$ items	3.167 s
		$1,000,000$ items	343.521 s
	Single Item	per item	0.301 ms
	Decoy Sampling	per item	0.0211 ms
Decryption	Decryption Time _{(non-deterministic)}	per item	3.011 ms
Decryption	Decoy Filtering	per item	0.0031 ms

Table 3. KL-Divergence

D_{K L} (P ‖ Q)

Analysis for Different

λ

.

Table 3. KL-Divergence

D_{K L} (P ‖ Q)

Analysis for Different

λ

.

Dataset	$λ = 0$	$λ = 2$	$λ = 4$	$λ = 6$	Uniform (Ideal)
Dataset in Figure 5	1.1003	1.0506	1.0337	1.0273	1.0272
Dataset in Figure 6	0.8183	1.0314	1.0591	1.0651	1.0669

Table 4. Evaluation of Partition-level Plaintext Recovery Rate (

P R R_{μ}

) with

μ = 100

.

Table 4. Evaluation of Partition-level Plaintext Recovery Rate (

P R R_{μ}

) with

μ = 100

.

Dataset	$λ = 0$	$λ = 2$	$λ = 4$	Uniform
Dataset in Figure 5	100%	2.96%	2.89%	2.54%
Dataset in Figure 6	100%	7.33%	4.33%	4.33%

Table 5. Leakage Profile: Baseline OPE vs. SOPE.

Leakage Channel	Underlying OPE	SOPE	Remarks
Order Information	Leaked	Leaked	Inherent to OPE by design
Access Patterns	Leaked	Leaked	Inherent to OPE by design
Query Sizes	Leaked	Leaked	Mitigable via [8], subject to overhead trade-offs
Data Density Correlation	Leaked	Mitigated	Mitigated by injecting stateful decoys
Response Sizes	Leaked	Mitigated	Obfuscated by the inclusion of decoys in the response

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jho, N.-S.; Youn, T.-Y. Stateful Order-Preserving Encryption for Secure Cloud Databases. Electronics 2026, 15, 1412. https://doi.org/10.3390/electronics15071412

AMA Style

Jho N-S, Youn T-Y. Stateful Order-Preserving Encryption for Secure Cloud Databases. Electronics. 2026; 15(7):1412. https://doi.org/10.3390/electronics15071412

Chicago/Turabian Style

Jho, Nam-Su, and Taek-Young Youn. 2026. "Stateful Order-Preserving Encryption for Secure Cloud Databases" Electronics 15, no. 7: 1412. https://doi.org/10.3390/electronics15071412

APA Style

Jho, N.-S., & Youn, T.-Y. (2026). Stateful Order-Preserving Encryption for Secure Cloud Databases. Electronics, 15(7), 1412. https://doi.org/10.3390/electronics15071412

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Stateful Order-Preserving Encryption for Secure Cloud Databases

Abstract

1. Introduction

2. Preliminaries

2.1. Notation

2.2. Previous Works on OPE

2.3. Security Notions

2.3.1. Formal Definition of IND-OCPA

2.3.2. Theoretical and Practical Gaps

2.4. Research Challenges

3. Proposed Scheme

3.1. Problem Statement

3.2. Design Intuition

3.3. Definition of Underlying OPE Primitive

3.4. Detailed Description

3.5. Illustrative Example

4. Analysis

4.1. Security

4.2. Performance and Efficiency

4.2.1. Computation Overhead

4.2.2. Storage Overhead

4.2.3. Transmission Overhead

4.3. Implementation

4.4. Analysis of Parameter Variation

Resilience Against Inference Attacks

4.5. Limitations and Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI