1. Introduction
Modern industry is transitioning beyond technical efficiency and automation into the era of the Fifth Industrial Revolution (5IR), which seeks to achieve the harmonious coexistence of humanity and technology. Data serves as the core engine of this paradigm shift. In particular, as the utilization of sensitive information to provide sophisticated, personalized services increases exponentially, ensuring the synergy between data utility and privacy protection has become more critical than ever. Recently, privacy breaches occurring when individual sensitive data is treated as a tool for industrial advancement have fostered distrust toward technological progress [
1]. Therefore, to ensure the sustainable development of the data industry, guaranteeing Human-Centric Security and Privacy through technical means has emerged as a fundamental challenge.
With the rise of smart IoT environments and digital healthcare, highly sensitive health information that was previously difficult to collect is being accumulated in massive quantities, fueling the rapid growth of the personalized service market. While these changes enhance the quality of human life, they simultaneously create a security issue: the risk of data leakage increases during the distribution and utilization of personal privacy data [
2]. In short, technology designed for good can unintentionally threaten human rights.
While security at the IoT edge device level cannot be overlooked, the integration of data collected from numerous sensors into a cloud environment creates a massive repository that fully encapsulates an individual’s life. Consequently, data breaches at the backend storage layer result in far more extensive and devastating privacy violations compared to threats stemming from single-device vulnerabilities. From this perspective, our ‘human-centric’ approach acknowledges the importance of both domains; however, our primary research focus lies in protecting this aggregated data within the cloud against threats at the persistent storage level [
2].
Although legal and institutional frameworks are being established to prevent privacy violations, these measures primarily focus on post-incident responses, making it difficult to stop security problems before they happen. Despite the application of server-side security such as Access Control, many recent privacy data breach cases have been caused by privileged users with administrative rights. Thus, it is essential to establish a technical foundation that maintains data in an encrypted state throughout its entire lifecycle. In this scenario, even if a server administrator or an external attacker obtains the data, they cannot comprehend the original information without the decryption key. By mathematically ensuring this level of data confidentiality, cryptographic approaches directly enforce the core security objectives outlined in established risk assessment methods, such as the NIST framework [
3], thereby reducing human-centric privacy risks.
However, conventional encryption techniques are designed to completely conceal the statistical characteristics of plaintexts, thereby maximizing security. Consequently, ciphertexts appear as random strings to anyone without the decryption key, making it impossible to perform operations such as searching, sorting, or classification in the encrypted state. While some methods temporarily decrypt data during processing to solve this issue, they create a vulnerable point where plaintexts are exposed in memory, leaving the data still at risk from insider attacks.
Order-Preserving Encryption (OPE) has been proposed to address these challenges by preserving the order of plaintexts during encryption. That is, for any plaintexts and , if , the relationship holds for their corresponding ciphertexts. Applying OPE allows for efficient operations like sorting and range queries directly on the encrypted database without decryption. However, OPE possesses an inherent vulnerability: the ciphertext itself exposes the order information of the plaintext. This provides attackers with data distribution information, enabling statistical inference attacks, which result in lower security compared to standard cryptographic algorithms.
To address these vulnerabilities, several encryption schemes have been developed to reduce distribution leakage. However, most existing approaches either fail to provide a formal security guarantee like IND-OCPA in a dynamic environment or incur significant performance degradation as the database size grows. Specifically, while some schemes attempt to flatten the distribution, they often lack an adaptive mechanism that responds to real-time data updates, creating a gap between theoretical security and practical cloud database requirements. This is the research gap SOPE aims to fill by providing an adaptive, stateful framework that ensures uniform ciphertext distribution through dynamic density adjustment.
While modern encrypted database literature highlights various leakage channels—such as access patterns and communication volume—as significant security concerns [
4,
5,
6,
7], research into mitigating these risks remains a highly active and critical field. In particular, OPE has been under intense security review because the exposed order of ciphertexts directly enables attackers to perform sophisticated attacks that reconstruct the original plaintext values. Recent studies have shown that even passive adversaries can exploit access patterns and result sizes to fully reconstruct plaintext values, often with near-optimal efficiency [
6,
7]. Furthermore, it has been demonstrated that such leakages can be combined with auxiliary public distributions to negate the privacy guarantees of encryption schemes in real-world datasets [
7].
As part of the ongoing efforts to address these vulnerabilities, researchers have explored various countermeasures, including techniques to adjust the amount of leakage or mask patterns through padding [
8]. Within this context, we propose a
Stateful Order-Preserving Encryption (SOPE) scheme. It is important to note that our objective is not to claim that SOPE provides absolute immunity against all forms of leakage-based analysis. Rather, we acknowledge the inherent security limitations of OPE and propose SOPE as an enhanced framework designed to maximize security within the achievable bounds of the OPE paradigm. Our approach focuses on neutralizing distribution-based inference attacks by transforming non-uniform plaintext distributions into uniform ciphertext distributions. By incorporating dynamic state information, SOPE aims to minimize the statistical signatures that adversaries exploit for reconstruction, thereby providing a significantly hardened data environment without sacrificing the query efficiency inherent to OPE.
The remainder of this paper is organized as follows:
Section 2 describes the technological evolution of OPE and provides a detailed analysis of the security limitations of existing schemes, including the specific mechanisms of recent leakage-abuse and reconstruction attacks;
Section 3 presents the design and architecture of the proposed SOPE;
Section 4 verifies its security and efficiency; and finally,
Section 5 concludes the paper.
3. Proposed Scheme
In this section, we propose the detailed design of Stateful-OPE (SOPE), which enhances security by utilizing the distribution of ciphertexts generated during the encryption process as state information. The proposed SOPE operates as a general security-enhancing framework that wraps around any conventional OPE scheme, adopting it as a cryptographic primitive. Since this design is independent of the underlying cryptographic primitive, it operates as a general framework that preserves the search efficiency of existing OPE systems. Simultaneously, by dynamically controlling the encryption process based on client-side state information, SOPE significantly improves resistance against statistical inference attacks.
3.1. Problem Statement
To address the limitations of conventional OPE, this work aims to design a stateful OPE (SOPE) scheme that satisfies the following two primary goals:
Security Goal: The scheme must achieve IND-OCPA security by ensuring that the ciphertext distribution converges to a uniform distribution, thereby neutralizing distribution-based inference attacks even when the relative order of ciphertexts is exposed.
Efficiency Goal: The scheme must maintain the core advantage of OPE—efficient range query performance—while providing a tunable parameter to manage the trade-off between security (decoy insertion) and storage overhead.
3.2. Design Intuition
To address the exposure of plaintext distribution in conventional OPE, SOPE provides a technique to hide the overall data distribution by dynamically inserting decoy ciphertexts into the ciphertext space. This part explains the core intuition of the proposed scheme.
While existing OPE algorithms () provide one-wayness, making the direct recovery of a plaintext m difficult, an adversary can still perform inference attacks by extracting statistical features from the ciphertext set . SOPE mixes a large volume of decoys with actual ciphertexts, forcing the observed distribution of the entire ciphertext set to be determined by the decoy distribution rather than the actual data distribution. By inserting a sufficiently larger number of decoys () compared to actual ciphertexts (n), SOPE completely reconstructs the data density upon which statistical analysis relies.
Simply adding decoys uniformly or using a fixed distribution creates a risk: an adversary might exploit the statistical properties of the decoys to filter them out and extract the original distribution.
To prevent this, SOPE employs a stateful dynamic adjustment mechanism. The ciphertext space is partitioned into sub-intervals, and the client maintains a count of the ciphertexts (both actual and decoy) assigned to each interval as state information. Specifically, this state information is defined as an array of integers, where each integer records the cumulative count of ciphertexts located within its corresponding sub-interval. When a new plaintext is encrypted and placed in a specific sub-interval, the count for that interval increases. Subsequently, SOPE adds decoys by selecting sub-intervals with the minimum count of existing ciphertexts. Through this process, the overall density of the ciphertext space converges toward a target distribution (in this case, a uniform distribution), regardless of the original plaintext distribution.
Note that SOPE can be flexibly configured to converge toward a specific distribution other than a uniform one by pre-defining the partition ratios and density differentials. However, as the underlying mechanism remains the same, this paper focuses on the uniform distribution for simplicity.
Furthermore, to defend against scenarios where an adversary manipulates the input order of plaintexts to induce ephemeral distribution imbalances, SOPE randomizes the processing order of plaintexts through a permutation process before encryption. This prevents potential information leakage that could occur if ciphertexts are temporarily concentrated in specific intervals during the dynamic encryption process. Consequently, the final ciphertext set reaches the target distribution with high probability, ensuring that an adversary observing the entire set cannot distinguish the statistical characteristics of the original plaintexts.
Finally, if an authorized user cannot distinguish between actual ciphertexts and decoys, any operations performed on the data would yield meaningless results. Therefore, authorized users possessing the secret key must be able to easily identify decoys. SOPE is designed to append a 1-bit tag, dependent on the secret key, to each ciphertext. This ensures that only a legitimate user can efficiently distinguish actual ciphertexts from decoys.
3.3. Definition of Underlying OPE Primitive
In this section, we formally define the underlying Order-Preserving Encryption (OPE) primitive that serves as the foundation for our proposed SOPE framework. To enhance security against frequency analysis, we transition from a conventional deterministic function to a non-deterministic construction. This approach is rooted in the early OPE literature.
Notations and Definitions: Let be a strictly increasing function representing a deterministic OPE. Without loss of generality, we define the plaintext space as a set of -bit integers, , and the ciphertext space as a set of -bit integers, . By the fundamental property of OPE, for any , the condition implies .
Construction of Non-deterministic OPE: A well-known limitation of deterministic OPE is that identical plaintexts map to the same ciphertext, leaving the scheme vulnerable to frequency analysis. To mitigate this, Agrawal et al. [
11] suggested a method to resolve duplicates by mapping a plaintext into an interval rather than a single point. Specifically, if a plaintext
m maps to
and
maps to
, the encryption of
m is randomly sampled from the interval
, ensuring that the ciphertexts of
m are uniformly spread. Following this intuition, we define a non-deterministic encryption function
derived from the deterministic OPE function
f. Given a secret key
and a plaintext
, the encryption process is defined as follows: For General Cases (
): We compute the boundary values using the base function
and
. The ciphertext
is then defined as a random integer
c sampled from the range:
To ensure the entire ciphertext space is utilized, the boundary values are handled as follows: For the minimum value
:
For the maximum value
:
The computational cost of this encryption process consists of exactly two evaluations of the function f and a single uniform sampling of an integer from a defined interval.
The
procedure is summarized in Algorithm 1 and illustrated in
Figure 1.
| Algorithm 1 OPE.Enc |
- 1:
Input: A plaintext m, secret key - 2:
Output: A Ciphertext c
|
- 3:
ifthen - 4:
- 5:
else ifthen - 6:
- 7:
else - 8:
, - 9:
end if - 10:
Choose a random bit-string - 11:
return
c
|
Decryption Process of Non-deterministic OPE: The decryption algorithm does not require an explicit inverse function of the base OPE. Instead, it identifies the original plaintext m by leveraging the monotonic property of the underlying function f. Specifically, given a ciphertext c, the decryption process first checks the boundary conditions to handle the edge cases of the plaintext space :
If , then .
If , then .
For all other cases, the decryption is performed through a binary search over the range . The search space is initialized as , and in each iteration, a midpoint is selected to compute and . The process continues until it finds m such that . Since the size of the plaintext space is , this binary search approach requires at most evaluations of the function f. Every integer in the ciphertext space is guaranteed to be mapped to a unique plaintext through this procedure.
The
procedure is summarized in Algorithm 2 and illustrated in
Figure 2.
| Algorithm 2 OPE.Dec |
- 1:
Input: A ciphertext c, secret key - 2:
Output: A plaintext m
|
- 3:
ifthen - 4:
return 0 - 5:
else ifthen - 6:
return - 7:
else - 8:
Find m such that ▹ Using binary search on - 9:
return m - 10:
end if
|
A key advantage of this non-deterministic construction is the universal validity of the ciphertext space. Since samples from the continuous interval , every integer is a decodable point that maps to a unique plaintext m.
Structural Equivalence: Any arbitrary bit-string within the range is a legitimate ciphertext. This makes the distinction between a specific output of f and a randomly sampled value within the interval practically irrelevant, as they are structurally identical.
Security Implication: This property effectively masks the deterministic backbone f. An attacker cannot distinguish whether a ciphertext originates from a fixed mapping or is a randomly selected value from a valid range, thereby mitigating frequency-based cryptanalysis.
In conclusion, we consider the non-deterministic OPE defined in this section as the underlying OPE for constructing our proposed SOPE framework. Hereafter, this primitive is formally denoted as:
All subsequent descriptions of our system’s operations and security analysis will be based on this probabilistic .
3.4. Detailed Description
This part describes the specific encryption and search procedures of the proposed SOPE scheme. The scheme adopts any non-deterministic order-preserving encryption algorithm as its underlying primitive, denoted as . To control the security level and state precision, two integer security parameters are pre-configured: (the decoy insertion ratio) and (the number of partitions).
Definition and Initialization of State Information: First, the ciphertext space is partitioned into equal-sized sub-spaces, . To track the total number of ciphertexts assigned to each sub-space, the user maintains a set of counters . This set is defined as the State Information of the proposed scheme. All counters are initially set to zero.
Encryption: Given a plaintext
m to be encrypted, the user first generates an actual ciphertext
using the internal OPE function. The sub-space
containing
c is identified, and the corresponding counter
is incremented by 1. To ensure the server cannot distinguish between actual data and decoys, a 1-bit identification tag, derived from a one-way hash function dependent on the secret key, is appended to
c, forming the final ciphertext
:
While denoted as a concatenation (‖) conceptually, in our database implementation, this operation is mathematically realized by shifting the integer c and adding the tag bit b (where ). That is, the stored ciphertext is (equivalently, ).
Note that this encoding method ensures that the tag bit never interferes with the original order of ciphertexts. Formally, for any two ciphertexts
where
, their encoded versions
and
satisfy
regardless of the tag values
. This can be verified by the following difference:
Since
and both are integers,
, which implies
. Given that the maximum possible value of
is 1 (when
), the minimum value of the difference is:
Therefore, always holds. This mathematical guarantee allows the server to perform range queries using standard integer comparison on ciphertexts without any specialized comparison logic.
Immediately after generating the actual ciphertext, the user generates decoy ciphertexts according to the predefined
. The user selects sub-intervals with the smallest counter values and extracts a random bit string
x belonging to those intervals. Since standard OPE maps a small plaintext space into a significantly larger ciphertext space, any random integer
x sampled from this highly sparse domain can be considered a valid OPE ciphertext. We provide a more detailed discussion of this property in the subsequent Remark on Non-deterministic Encryption. A decoy
is formed by appending an inverted identification tag:
The corresponding counters are also updated. Finally, the user transmits the complete set, consisting of all actual ciphertexts and their corresponding decoys, to the server only after the encryption process for the entire plaintext set is finalized. Through this process, the density of the entire ciphertext space converges to a uniform distribution, regardless of the original plaintext distribution.
Search and Decryption: Range queries
on the encrypted data exploit the inherent properties of OPE. To perform a search, the user first computes the OPE boundaries for the given range:
and
. To account for the 1-bit identification tags in the encoded index, the user constructs the precise query boundaries
as follows:
This construction ensures that for any plaintext
, its corresponding encoded ciphertext
is guaranteed to fall within
, regardless of the tag bit
. The server extracts and returns all elements within the range
from the stored ciphertext set. Upon receiving the results, the user verifies the 1-bit tag for each ciphertext by recomputing it with their secret key. Only ciphertexts with matching tags are considered actual data and decrypted, while mismatched decoys are discarded. This filtering process ensures that authorized users obtain accurate plaintext results without interference from decoy data. The encryption, range query, and decryption procedures are summarized in Algorithms 3, 4 and 5, respectively. Additionally, the procedure for range query is illustrated in
Figure 3.
| Algorithm 3 SOPE.Enc |
- 1:
Input: Plaintext set , secret key , partition count , decoy ratio , sub-spaces , state information - 2:
Output: Ciphertext set S
|
- 3:
if First execution of encryption then - 4:
for all ▹ Initialize state information - 5:
end if - 6:
- 7:
▹ Optional: Order Randomization - 8:
forto
n
do - 9:
- 10:
- 11:
Find j such that - 12:
- 13:
▹ 1-bit Tag for real data - 14:
- 15:
for to do - 16:
Find such that ▹ Find the sparsest partition - 17:
Choose a random bit-string - 18:
- 19:
▹ Inverted tag for decoy - 20:
- 21:
end for - 22:
end for - 23:
return
S
|
| Algorithm 4 SOPE Range Query Generation |
- 1:
Input: A range , secret key - 2:
Output: Range query
|
- 3:
- 4:
- 5:
return
|
Figure 4 illustrates the overall architecture and operational workflow of the proposed SOPE scheme. To preserve strict data privacy and minimize communication overhead, the core encryption processes are heavily localized. The client independently handles the state information (
-variables) and executes the SOPE engine. Consequently, the untrusted server functions merely as a blind storage and query processor, operating without any knowledge of the client’s internal state or the underlying data distribution.
| Algorithm 5 SOPE.Dec |
- 1:
Input: A ciphertext , secret key - 2:
Output: Plaintext m or ⊥ (if C is a decoy) - 3:
Parse C as , where t is the last 1 bit of C - 4:
▹ Recompute the expected tag - 5:
if then ▹ Check if the ciphertext is real - 6:
▹ Internal OPE decryption - 7:
return m - 8:
else ▹ The ciphertext is a decoy - 9:
return ⊥ - 10:
end if
|
3.5. Illustrative Example
To clearly demonstrate the practical mechanics and the security advantages of our proposed SOPE algorithm, we present a simplified illustrative example. Suppose the plaintext space is and the ciphertext space is . We partition both spaces into 10 intervals, setting the distribution management parameter to .
Let us consider two highly skewed plaintext vectors to observe the distribution leakage: , which is heavily concentrated in the lower intervals, and , which is concentrated in the higher intervals.
Suppose the underlying internal OPE function encrypts these plaintext vectors into for and for . If we solely rely on this internal OPE primitive, the resulting ciphertexts inevitably reflect the original data distribution. Consequently, an adversary could easily distinguish between the two sets and infer the approximate plaintext ranges simply by observing the ciphertext distributions.
To mitigate this vulnerability, we apply our SOPE scheme with (i.e., inserting three decoy ciphertexts for every one real ciphertext). For the sake of clarity in this example, we denote the ciphertexts corresponding to actual plaintexts with numbers ending in ‘0’, and the decoy ciphertexts with numbers ending in ‘1’. This notational choice intuitively reflects the actual mechanism of our SOPE scheme, where the least significant bit (LSB) is utilized to distinguish between real and decoy ciphertexts.
Let us consider the step-by-step application of this process. For the first plaintext vector, :
When encrypting the first plaintext 10, the internal function outputs 150. To mask this, the SOPE algorithm identifies the three intervals with the lowest frequencies and injects three decoys (e.g., ). The intermediate ciphertext set becomes .
Next, 20 is encrypted to 240. The algorithm again selects the three least populated intervals at this stage and adds decoys , and 871. The updated set grows to .
Finally, 30 is encrypted to 410, and three new decoys , and 781 are added to the most vacant intervals. The finalized mapping for becomes .
Similarly, for the second plaintext vector, :
The encryption of 70 yields 650, and three decoys () are injected, resulting in .
Encrypting 80 yields 710, with new decoys , and 951. The set expands to .
Finally, encrypting 90 yields 820, with decoys , and 551. The finalized mapping for is .
Through this progressive demonstration, it is evident how SOPE strategically populates the least frequent intervals. As a result, the finalized mappings for the two highly skewed plaintext vectors,
and
, can be summarized as follows:
The resulting ciphertext sets and span the entire ciphertext space, thoroughly concealing the highly skewed distribution of the original plaintexts and effectively demonstrating the security advantage of the SOPE algorithm.
5. Conclusions
In this paper, we proposed SOPE, a novel framework designed to effectively mitigate data distribution exposure and subsequent inference attacks, key vulnerabilities faced by conventional Order-Preserving Encryption (OPE) in cloud database environments.
While previous OPE schemes attempted to enhance security by adjusting ciphertext space density or relaxing deterministic properties, they shared a fundamental limitation: the distribution of plaintexts remained traceable within the ciphertext set. To overcome this, the proposed SOPE scheme introduces a partition-based stateful density adjustment algorithm. This mechanism is designed to offset frequency gaps between partitions by inserting decoy ciphertexts during the encryption process.
Through both theoretical analysis and empirical evaluations, we demonstrated that the ciphertext set generated by SOPE effectively converges toward a uniform distribution, provided that the density parameter is sufficiently large. Based on this empirical convergence, our findings suggest that the scheme can achieve the security characteristics of IND-OCPA security in practical settings, a standard security model for order-preserving primitives.
In conclusion, SOPE optimizes the trade-off between security and storage overhead, offering a practical and effective alternative that can be immediately deployed in commercial database infrastructures requiring high security. Future research will focus on establishing a theoretical framework to determine the optimal parameters for and , aiming to maximize security while minimizing storage overhead. Additionally, we plan to explore technical extensions to support complex multi-dimensional queries beyond simple range searches.