Next Article in Journal
Impact of Collagen Peptide Supplements Dissolved in Different Beverages on the Surface Properties of Dental Restorative Materials
Previous Article in Journal
Influence of Menstrual Cycle Phases on Muscle Activation in Women: A Systematic Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimized and Privacy-Preserving MAX/MIN Protocols for Large-Scale Data

Department of Computer Engineering, Jeonju University, Jeonju 55069, Republic of Korea
Appl. Sci. 2026, 16(5), 2580; https://doi.org/10.3390/app16052580
Submission received: 3 February 2026 / Revised: 26 February 2026 / Accepted: 2 March 2026 / Published: 8 March 2026
(This article belongs to the Special Issue Application of Big Data Technology Based on Machine Learning)

Abstract

In the era of big data, data is key to the accuracy of analytical models, and cloud computing services are often used to efficiently process large volumes of data. However, outsourcing sensitive data to a third-party cloud service provider results in a loss of direct control over the data, raising serious security concerns. The target of this study is to propose highly efficient and privacy-preserving protocols that compute the maximum/minimum value in large-scale data. To achieve the improvements in efficiency, the proposed protocols reuse the intermediate results generated in independent subprotocols. Existing privacy-preserving maximum/minimum protocols are based on approximation methods that sacrifice accuracy or reveal information during execution. They use costly comparison operations that are proportional to the size of the input data and are not suitable for large-scale data applications. In contrast, the proposed protocols theoretically reduce the number of communication rounds by 25%, the communication size by 50%, and the computational cost by 42% compared to the existing protocols. Nevertheless, the accuracy and privacy are fully maintained. In order to demonstrate these efficiency improvements concretely, we conducted experiments and demonstrated that the proposed protocols reduce the communication volume by half and the execution time by 22%. Because the proposed protocols support parallel execution, their performance can be substantially enhanced in cloud environments that provide large-scale parallel processing resources. Even data owners with restricted computational capabilities can use the protocols without exposing their information. Under the secure version, even cloud servers executing the protocol learn nothing about the input data or the computation results.

1. Introduction

The advent of big data has rendered data analytics an indispensable tool for extracting knowledge from voluminous and intricate datasets. In various areas such as agriculture, finance, education, and healthcare, big data analytics plays a crucial role in uncovering latent patterns and correlations, thereby enabling more informed decision-making [1]. For example, it facilitates personalized medicine in healthcare and was used to track the spread of disease during the COVID-19 pandemic. The timely analysis of such large-scale data requires high-performance and scalable computing solutions to process large datasets [2]. It is very expensive to build such large-scale infrastructure individually. In contrast, cloud computing allows users to access scalable computing resources on demand and pay only for what they use. This approach leads to a substantial reduction in initial expenses. The features of the “pay-as-you-go” model and low initial investment make cloud computing particularly well-suited to big data analytics [3]. Even small and medium-sized enterprises can now conduct large-scale data analytics without requiring costly data centers.
However, when data is outsourced to an external cloud service provider, the data owners lose direct control over the data. If the provider is malicious, the data may be at risk of exposure [4]. Surveys consistently highlight security and privacy as the primary concerns regarding cloud adoption. For example, 2022 Cloud Computing TechReport noted that ‘confidentiality/security concerns’ were the main issue for 62% of cloud users [5]. In the case of sensitive personal information such as medical records or financial data, strong countermeasures must be in place to ensure privacy even if the cloud infrastructure is not trusted. To mitigate these risks, data owners commonly encrypt their data before uploading it to the cloud. However, conventional encryption schemes make it impossible to process the data without decryption. Furthermore, even if the data is encrypted, the cloud may still infer sensitive information by analyzing access patterns during computation.
There are two main techniques for privacy-preserving computation: multiparty computation (MPC) and homomorphic encryption (HE). HE is an encryption scheme that enables computations to be performed on encrypted data without decryption [6]. As a type of HE, partially homomorphic encryption (PHE) allows for an unlimited number of specific operations (e.g., addition), but it does not support arbitrary computation. MPC enables multiple parties to jointly compute a predefined function over their private inputs without disclosing these inputs to one another. Each party only receives the result and learns nothing about the other parties’ inputs. As MPC operates on secret shares, rather than encrypted data, our work adopts homomorphic encryption due to its suitability for our model. Specifically, we focus on additively homomorphic encryption, a type of PHE that supports the addition operation over ciphertexts. Since applying privacy-preserving techniques inevitably increases the computational overhead, it is important to develop the protocols that improve efficiency. In particular, it is imperative to minimize the number of computations and communications associated with large-scale parameters (e.g., the amount of data in large-scale data) to optimize overall protocol performance. Furthermore, in order to make full use of the parallel processing capabilities of clouds, protocols must be designed with parallelizability in mind.
Various computations have been proposed for privacy-preserving large-scale data analytics, such as maximum/minimum [7,8,9], comparison [10,11,12], and equality [13,14,15]. The privacy-preserving maximum/minimum ( p p M A X / p p M I N ) protocols are especially versatile and can be applied to many scenarios. For example, it is used in sealed-bid auctions [9], electronic voting [16], sensor networks, IoT monitoring [17], health care [18], and privacy-preserving machine learning [19]. In the privacy-preserving hierarchical clustering protocol of [20], the ppMAX/ppMIN protocols are executed as many as the amount of input data in each round, accounting for nearly half (45–50%) of the total execution time. Therefore, the development of an efficient ppMAX/ppMIN protocols for large-scale dataset can significantly benefit a wide range of applications and improve the efficiency of many other privacy-preserving protocols.
However, existing ppMAX/ppMIN protocols use approximation techniques to improve efficiency, which can lead to inaccurate results, or when the results are accurate, a significant amount of execution time is required. Futhermore, they reveal information during execution. For example, the protocols in [18,21] are based on the CKKS approximate homomorphic encryption scheme developed by Cheon et al. [22], and the works of [8,21] use approximation approaches to compute the maximum/minimum values privately. These protocols may result in inaccurate results. Although the authors of [7,8,17,23] propose efficient ppMAX/ppMIN protocols for IoT environments, these protocols expose a substantial amount of information. The protocols in [16,24] are not based on approximation techniques; however, the authors of [16] showed results for small-scale auctions and voting with only 10–30 inputs, and a sealed-bid auction of [24] with 1230 bidders takes approximately 30 min. The efficiency of these works is extremely low, and therefore, it is essential to develop more efficient ppMAX/ppMIN protocols. This study aims to design ppMAX/ppMIN protocols that significantly improve efficiency. Table 1 presents a comparison of the execution times between the existing protocols and the proposed ones.

Contribution

This study enhances the efficiency of the prior protocols that compute the maximum or minimum value from an input dataset while preserving data privacy. The proposed protocols are called i p M A X / i p M I N (Improved Privacy-Preserving Maximum/Minimum) protocols. The proposed ipMAX/ipMIN protocols are classified as a secure version and an efficient version. In comparison to the prior protocols [20], the secure version theoretically reduces the number of communication rounds by 25%, the communication volume by 50%, and the computational cost by 42%. For a detailed theoretical analysis, refer to the efficiency evaluation in Section 3. To validate this analysis, we implemented the proposed protocols and conducted experiments. The results demonstrated that the communication volume is reduced by half and the execution time by 22% compared to the prior protocols. Detailed experimental results are presented in Section 4. These efficiency gains are achieved by eliminating communications and computation whose volume is proportional to the amount of input data. This reduction makes the protocols more suitable for large-scale data applications. Despite these improvements, there is no loss of accuracy or privacy in the results of the secure version. The key idea is the integration of independently executed subprotocols in the prior protocols [20]. By reusing intermediate results generated by these subprotocols—particularly those proportional to the size of the input data—the proposed protocols minimize redundant computations and transmissions, enabling a protocol design that is well suited to large-scale data applications.
The proposed protocols not only improve efficiency but also retain all the advantages of existing protocols. They support the parallel execution for each data point since the computations for each data point are independent and can be processed simultaneously. In other words, since the round complexity, which is proportional to the execution time, is independent of the amount of input data, they are well suited to large-scale data environments. As the execution time decreases in proportion to the number of parallel computations (threads), it is expected that the performance of the proposed protocols is significantly enhanced in cloud environments that support massive parallelism. Prior ppMAX/ppMIN protocols [18,25,26] rely on comparatively costly comparison operations whose number scales with the input size; as a result, their execution time increases significantly as the dataset grows. In contrast, the proposed protocols use equality operations, which are substantially cheaper than comparison operations, and their cost scales with the bit length of each data value, rather than with the number of input data points. Because the bit length is typically much smaller than the dataset size in large-scale settings, the proposed protocols are particularly suitable for large-scale data analytics. In addition, when the maximum/minimum are multiple, the proposed protocols return all such values, thereby providing results that are both more complete and more precise.
Even if data owners have limited computational resources that are not capable of large-scale data computation, they can still participate in the proposed protocols and obtain results since all computations are performed exclusively by the cloud servers with no assistance from the data owners. That is, the role of the data owners is limited to submitting their input data to the cloud servers and receiving the final outputs; they are not involved in any additional communication or computation during the protocol execution. From a privacy standpoint, the secure version discloses no information about either the input data or the computation results. Because all values processed by the cloud are represented either as ciphertexts or as randomized data, and the executing cloud servers learn no meaningful information, the secure version preserves data privacy even when deployed in an external cloud environment. Major cloud service providers (e.g., Amazon, Google, and Microsoft) offer powerful computing resources capable of massive parallel processing, which enables the proposed protocols to efficiently compute the maximum/minimum value. Moreover, the protocols also protect data access patterns since all data are processed uniformly, regardless of the results. The efficient version provides higher efficiency than the secure version because it terminates immediately when the maximum/minimum is determined. In other words, the efficient version gains efficiency by exposing information. Therefore, the secure version and the efficient version should be chosen according to the application environment, considering the trade-off between security and efficiency. The main contributions of this work are summarized as follows:
  • We design efficient and privacy-preserving protocols that compute the exact Maximum/minimum value over large-scale data under a dual non-colluding cloud server model.
  • We present an explicit privacy–efficiency trade-off through two variants: a secure version and an efficient version. The checking bit set in the efficient version enables multi-level choices between security and efficiency).
  • Through a protocol-level fusion redesign, we structurally eliminate redundant costs that scale with the number of input data points, making the protocols well suited for large-scale data.
  • While maintaining the privacy level and accuracy of the state-of-the-art scheme, the proposed protocol reduces communication rounds by 25%, the communication volume by 50%, the computation cost by 42%, and the execution time by 22%.
The remainder of this paper is organized as follows: Section 2 provides the background knowledge needed to understand this work, such as the system model, additively homomorphic encryption, efficiency evaluation measures, and the secure equality functionality used in the proposed protocols. Section 3 presents the proposed ipMAX/ipMIN protocols, along with their efficiency evaluation, and Section 4 demonstrates their efficiency with experiments. Section 5 reviews related works, and Section 6 concludes this work.

2. Preliminaries

This section introduces the background knowledge necessary to understand this paper. Section 2.1 describes the system model of the proposed protocols, and Section 2.2 explains their adversary model. Section 2.3 explains the additively homomorphic encryption scheme, and Section 2.4 explains the metrics used to evaluate the efficiency of the protocols. Section 2.5 provides a brief introduction to the secure equality functionality used in the proposed protocols and its existing implementations. Lastly, Section 2.6 briefly explains the prior protocol [20] that our work improves upon. Table 2 summarizes the notations used throughout this paper.

2.1. System Model

The proposed protocols operate in a dual cloud server setting under a non-collusion assumption. This setting involves two entities: a data host (DH) and a cryptographic service provider (CSP). The non-collusion assumption means that DH and CSP behave independently and do not share their intermediate results or internal states. In real-world environments, the DH and CSP can operate in physically or logically isolated cloud-based environments without collusion. Each cloud server operates across separate cloud platforms (e.g., AWS and Azure), which enhances scalability and isolation. To start the proposed protocols, CSP generates a key pair consisting of an encryption key (public key, PK) and a decryption key (secret key, SK). Then, the CSP distributes the PK to both DH and data owners. The data owners then encrypt their data using the PK and send them to the DH. To execute the proposed protocols, it is imperative that DH holds encrypted input data and CSP holds SK. The DH and CSP then execute the proposed ipMAX/ipMIN protocols through computation and communication, according to the specifications of these protocols. Upon the completion of the protocols, the DH obtains the encrypted index of the maximum/minimum value in the input data, while the CSP does not obtain any information. The encrypted index can be changed to the encrypted maximum/minimum value easily. The proposed protocols are particularly useful in cross-institutional scenarios, e.g., multiple hospitals identifying the maximum values in encrypted patient metrics, or companies securely outsourcing minimal cost computations without revealing raw data.

2.2. Adversary Model and Security Definition

The adversary model is classified as semi-honest adversary and malicious adversary models. In the semi-honest adversary model, a compromised party correctly follows the protocol specification but attempts to obtain information about the inputs and outputs by analyzing the transmitted data and the intermediate results. In the malicious adversary model, a compromised party behaves arbitrarily to achieve its goals, regardless of the protocol. The protocols designed under the semi-honest adversary model are significant, as they represent the first step in developing protocols with superior security. In runtime environments, robustness against semi-honest attackers is achieved by deploying the protocol on secure communication channels (e.g., TLS). It can be extended to provide robustness against a malicious adversary by using either zero-knowledge proofs [27] or consistency checks [28]. The proposed protocol (secure version) ensures that no information is revealed to either of the two cloud servers under semi-honest adversary model, as long as they do not collude. This model is reasonable and suitable for practical systems because major cloud service providers typically prioritize their reputation over any potential benefit from collusion, and because real-world constraints—such as legal, contractual, and compliance obligations—make collusion costly and risky in deployment.
To prove the security of the proposed protocol in a formal manner, we adopt the standard security definition for the semi-honest adversary model [10,29].
Definition 1.
Let x denote the input of party p, let Π p ( π ) denote the execution image of party p when running protocol π, and let y denote the output of party p computed by protocol π. A protocol, π, is secure if Π p ( π ) can be simulated, given only x and y, and if the distribution of the simulated image is computationally indistinguishable from the distribution of Π p ( π ) . That is, party p learns no information from protocol π beyond what is implied by its input and output.
In this definition, the image includes the party’s input and output, as well as the received messages during the protocol execution. After presenting the proposed protocol in Section 3, we prove its security under this definition.

2.3. Additively Homomorphic Encryption

The proposed protocols are based on an encryption scheme that supports additive homomorphism. In the experimental evaluation presented in Section 4, this encryption scheme is instantiated with the Paillier cryptosystem. Paillier is a probabilistic public-key encryption scheme that provides semantic security, meaning that an adversary given only a ciphertext cannot infer any information about the underlying plaintext. Although Paillier is used in our implementation, the proposed protocols are compatible with any additively homomorphic encryption scheme. We denote the encryption and decryption functions as E p k ( · ) = E ( · ) and D s k ( · ) = D ( · ) , respectively, where p k and s k are the public and secret keys. For simplicity, we omit the explicit key subscripts hereafter. The additive homomorphic property allows plaintext additions to be carried out directly on ciphertexts without requiring decryption. Formally, for any two values, a , b Z N , the following equation holds:
D ( E ( a ) E ( b ) m o d N 2 ) = a + b m o d N
D ( E ( a ) b m o d N 2 ) = a · b m o d N
For brevity, the terms m o d N 2 and m o d N are omitted from subsequent equations.

2.4. Efficiency Evaluation

Since the proposed protocols are executed through the interactions and computations of two cloud servers (DH and CSP) in the dual non-colluding cloud server model, their efficiency is determined by communication and computation costs. Accordingly, we evaluate the efficiency of a protocol using these two costs. In particular, our evaluation considers the number of communication rounds, the communication volume, and the computational costs incurred by DH and CSP. As the computational cost, we only consider encryption/decryption and exponentiation operations since other operations, such as homomorphic addition and permutation, have little influence on the overall performance. We assume that the execution times of the encryption and decryption are identical. The computation cost of the protocol is evaluated for three cases: the normal case, precomputation, and parallel execution. The normal case is the total number of operations required to execute a protocol. Precomputation is meaningful because it can reduce online execution time significantly by performing operations that are independent of input (e.g., encrypting random values) in advance. The parallel execution case means the number of sequential batches, with sufficiently many parallel threads assumed; i.e., independent encryption/decryption and exponentiation operations across data points are counted as one batch when executed concurrently. Following the protocol descriptions in Section 3, we analyze its efficiency theoretically by computing its communication and computation costs. In Section 4, we evaluate the efficiency of the proposed protocols based on the experimental results.

2.5. Secure Equality (SEQ) Functionality: F S E Q

This section introduces the secure equality functionality F S E Q used in the proposed protocols, along with the existing protocol [30] that realizes it. In this paper, the invocation of the S E Q functionality is represented as an interactive protocol that DH and CSP run with a third party to compute the functionality F S E Q ideally.
Intuitively, F S E Q computes whether two values, γ and ρ , are equal, where the input data γ and ρ are bitwise. Formally, the functionality F S E Q receives two encrypted datasets, { E ( γ k ) } k θ 1 and { E ( ρ k ) } k θ 1 , from DH, as well as a secret key, S K , from CSP. Then, it sends E ( q ) to the DH and nothing to the CSP, where E ( q ) = E ( 1 ) if γ k = ρ k for all k, and otherwise, E ( q ) = E ( 0 ) . We represent the functionality F S E Q as follows.
( E ( q ) , ) F S E Q ( { E ( γ k ) , E ( ρ k ) } k θ 1 , S K ) where { γ k , ρ k } { 0 , 1 } 2 , E ( q ) = E ( 1 ) , if γ k = ρ k for k E ( 0 ) , otherwise
The real protocol for privately computing the functionality F S E Q was proposed in the existing work [30]. As with the proposed protocols, the existing protocol [30] was implemented in the dual non-colluding cloud server model and was formally proven secure in the semi-honest adversary model. The protocol requires transmitting the data of size ( θ + 1 ) C in a single communication round and computing ( θ + 1 ) encryptions/decryptions and ( 2 θ + 1 ) exponentiations. When executed in parallel, the number of encryption/decryption and exponentiation operations can be reduced to 2 and 3, respectively. The detailed communication and computation costs are summarized in Table 3.

2.6. Prior Protocol

This section explains the existing ppMAX/ppMIN protocols [20], which we improve upon in the next section. In Algorithm 1, the input dataset is { E ( d i ) B } i [ n ] , and the output dataset is { E ( C i ) } i [ n ] , where the data point d i corresponding to C i = 1 is the maximum. In the algorithm, n is the number of input data points, l is the bit length of each data point, and  θ is the bit length of the number n. The secure multiplication functionality, denoted as F S M , is a cryptographic operation that multiplies two original plaintext values.
Algorithm 1 ppMAX/ppMIN protocols [20]
Input: { E ( d i ) B } i [ n ] where E ( d i ) B = { E ( d i , l 1 ) , , E ( d i , 1 ) , E ( d i , 0 ) } and d i , j { 0 , 1 }
Output: { E ( C i ) } i [ n ] , where C i = 1 if the corresponding data d i is the maximum, and otherwise, C i = 0 .
1:
E ( C i ) E ( 1 )
( i [ n ] )
2:
for  j = l 1 to 0 do
3:
    ( E ( P i ) , ) F S M ( { E ( C i ) , E ( d i , j ) } , S K )
( i [ n ] )
4:
    E ( s ) i = 1 n E ( P i )
5:
    { E ( γ k ) } k θ 1 E ( s ) E ( ρ )
( ρ R Z N )
6:
    ( E ( q ) , ) F S E Q ( { E ( γ k ) , E ( ρ k ) } k θ 1 , S K )
7:
    ( E ( t i ) , ) F S M ( { E ( C i ) , E ( q ) } , S K )
( i [ n ] )
8:
    E ( C i ) E ( t i ) E ( P i )
( i [ n ] )
9:
end for
10:
return  { E ( C i ) } i [ n ]
The protocols process one bit input at a time in an iterative manner, beginning with the most significant bit and proceeding toward the least significant bit. In each iteration of the loop, the following operations are carried out: first, the multiplication P i C i d i , j is computed for each data point (line 3), and the value s is obtained by aggregating all of the P i values (line 4). Then, the value γ = s + ρ is computed and bit-decomposed (line 5). The SEQ functionality checks whether all bits of γ and ρ are identical, and the result is denoted as q (line 6), which indicates whether s = 0 . Subsequently, the data point C i C i q + P i is updated for each data point (lines 7–8). In order to protect data privacy, all data are encrypted using an additively homomorphic encryption scheme, and all computations are performed in a privacy-preserving manner. The SM and SEQ protocols are independently performed and securely implemented in prior works [30,31].
However, considering the optimization of the protocols, the communication and computation costs can be significantly reduced by leveraging intermediate computational results produced during the execution of these subprotocols. In order to achieve optimal efficiency in large-scale data, it is critical to minimize communication proportional to the amount of input data, which is typically the largest parameter in large-scale data. In this work, we eliminate communication by leveraging intermediate computational results within the subprotocols. In the next section, we present a detailed description of the proposed ipMAX/ipMIN protocols, which improve the efficiency of the existing ppMAX/ppMIN protocols.

3. Improved Privacy-Preserving Maximum/Minimum (ipMAX/ipMIN) Protocols

This study improves the efficiency of prior protocols [20] that compute the maximum or minimum value from an input data while preserving the privacy of the data. We call our improved protocols the ipMAX/ipMIN (Improved Privacy-Preserving Maximum/Minimum) protocols. The proposed protocols are classified as a secure version and an efficient version. The secure version computes the exact maximum/minimum value without disclosing any information about either the inputs or the outputs. In contrast, the efficient version can achieve faster execution at the cost of limited leakage: it reveals the bit position at which the maximum/minimum value becomes determined, while all other information remains protected. To improve efficiency, we integrate the subprotocols invoked in the prior protocol and redesign the protocols using a cancellation-based construction. Specifically, the prior protocols [20] use two independent subprotocols: the secure multiplication and the secure equality protocols. Each subprotocol is defined and executed in isolation, meaning that any intermediate computational results generated within the subprotocol remain encapsulated and unused by other protocols. However, in the proposed protocols, we integrate these independent subprotocols, reuse intermediate computation results, and redesign the protocols with a cancellation structure that cancels out random masking terms. As a result, the proposed protocols structurally remove redundant communication and computation overhead that grows linearly with the dataset size n, thereby alleviating a key bottleneck in large-scale environments. More concretely, when compared with the protocol of [20], the secure version requires 25% fewer communication rounds and 50% less total communication. It also reduces the total number of encryption/decryption computations by 42%. The efficient version can reduce them more.
The proposed protocols retain all the advantages of the existing ppMAX/ppMIN protocols, in addition to the aforementioned efficiency improvements. The proposed protocols allow all input data points to be processed in parallel, since the computations for individual data points are mutually independent. In other words, the round complexity—which directly affects execution time—is independent of the input dataset size. This makes the protocols highly scalable and efficient for large-scale data processing. The performance of the protocols is enhanced in proportion to the number of parallel operations. Therefore, when deployed in a cloud computing environment with massive parallel computing capabilities, it is expected that their performance significantly increases with the degree of parallelism. Earlier ppMAX/ppMIN schemes [18,25,26] rely on comparison operations whose cost scales with the number of input data points. By contrast, our protocols use more efficient equality checks, and their cost depends on the bit length of the input data point, rather than on the number of inputs. In large-scale settings, where the bit length of data is typically much smaller than the total number of inputs, this design provides a clear computational advantage. Moreover, the proposed protocols return all of the maximum/minimum values when duplicates exist, yielding more complete results than prior approaches.
From a privacy perspective, the secure version does not leak information about either the input data or the final output, because all values processed by DH and CSP are represented as encrypted or randomized values. In addition, the protocols are resistant to data-access-pattern attack, since every data point is processed uniformly, regardless of intermediate outcomes. These properties make the proposed protocols suitable for secure execution in outsourced cloud environments. Major cloud providers, including Amazon, Google, and Microsoft, offer infrastructures with strong parallel-processing capabilities, making the proposed protocols well suited for efficiently executing the proposed maximum/minimum protocols. Furthermore, data owners (input parties) do not participate in any computation or communication beyond the initial transmission of input data and the final receipt of computational results. In other words, all computations are performed exclusively by the cloud servers without any assistance from the data owners. Therefore, even data owners with resource-constrained devices (e.g., mobile phones or IoT devices) can participate in the proposed protocols and obtain the final result without performing heavy computations themselves.
The functionality of the proposed ipMAX/ipMIN, denoted as F i p M A X / i p M I N , is defined as follows:
  • Functionality F i p M A X / i p M I N : It receives the encrypted input datasets { E ( d i ) B } i [ n ] from DH and a decryption key SK from CSP. It then returns the encrypted output datasets { E ( C i ) } i [ n ] to the DH, where C i = 1 if the corresponding data point, d i , is the maximum/minimum, and otherwise, C i = 0 . The functionality F i p M A X / i p M I N is represented as follows:
( { E ( C i ) } i [ n ] , ) F i p M A X / i p M I N ( { E ( d i ) B } i [ n ] , S K )
We present the secure version in Section 3.1 and the efficient version in Section 3.2. Figure 1 shows the flowcharts of the secure and efficient versions.

3.1. Secure Version of the ipMAX/ipMIN Protocols

3.1.1. Secure Version of the ipMAX Protocol

The secure variant of the ipMAX protocol determines the maximum value of the input dataset without leaking any information about either the inputs or the intermediate/final computational outcomes. In this protocol, each input data point, d i , is paired with an auxiliary data, C i { 0 , 1 } . Each d i is assumed to be represented using l bits. The maximum is determined by repeatedly updating the auxiliary data C i , examining one bit of input data at a time from the most significant bit to the least significant bit. Accordingly, the procedure runs for l iterations in total.
  • Auxiliary data C i : The value C i indicates whether the corresponding input d i is the maximum candidates, i.e., the data points still considered possible maxima. More precisely, C i = 1 means that d i is still a candidate, whereas C i = 0 means that it has already been eliminated from the set of maximum candidates. Initially, all data points belong to the candidate set. As the protocol proceeds, some candidates are discarded on the basis of bitwise computations. By the end of the procedure, the remaining data point(s) with C i = 1 are determined as the maximum value(s). The maximum value is determined from the candidate set, and once a data point d i is removed from the candidate dataset (that is, once C i changes from 1 to 0), it cannot re-enter in later iterations; the transition C i = 1 0 is permitted, whereas C i = 0 1 is not.
  • Intuitive Idea: The ipMAX protocol finds the maximum by updating the auxiliary data C i bit by bit, using the current bit d i , j in each iteration, starting from bit position ( l 1 ) and moving down to bit position 0. At a given iteration, candidate data points whose current bit is 1 (i.e., those satisfying C i d i , j = 1 ) are regarded as the predicted maximum candidates. This follows from the observation that, when two binary values, such as a = 1101 1000 and b = 1101 0110 , are compared from the most significant bit downward, the higher-order bits (bits 7 through 4) are identical (1101), and thus, no ordering can yet be determined. However, at the third bit, a 3 = 1 and b 3 = 0 , which clearly indicates that a > b, regardless of the remaining lower (2-0)-th bits. In the last step, the ipMAX protocol removes the unpredicted maximum data—which are determined to be smaller—from the candidate set, and thus the candidate data is consistently greater than the non-candidate data. Each iteration of the proposed protocol consists of three steps, whose ideas are as follows:
  • Step 1: The protocol privately computes γ = s + ρ , where s denotes the number of predicted maximum candidates ( s = C i d i , j ), and  ρ is a random value used to blind the number s. This step corresponds to lines 3–5 of Algorithm 1.
  • Step 2: The protocol uses the SEQ functionality to privately determine whether s + ρ is equal to ρ , which is equivalent to checking whether s = 0 . If s = 0 (i.e., s + ρ = ρ ), the protocol sets q = 1 , and otherwise (i.e., s + ρ ρ ), it sets q = 0 . This step corresponds to line 6 of Algorithm 1.
  • Step 3: Based on the value of s, the protocol privately updates the auxiliary data C i , according to C i C i q + C i d i , j . If s 0 (equivalently, q = 0 , meaning that at least one predicted maximum candidate exists), all non-predicted data are removed from the candidate set, and only the predicted data remain, i.e.,  C i C i d i , j . If s = 0 (equivalently, q = 1 , meaning that no predicted maximum candidate exists), then all C i d i , j = 0 , so the candidate set remains unchanged, i.e.,  C i C i , and the protocol proceeds to the next iteration. This step corresponds to lines 7–8 of Algorithm 1.
By repeating this update process from the most significant bit to the least significant bit, the ipMAX protocol gradually eliminates smaller values—namely, the unpredicted maximum data—from the candidate set. Consequently, the data point(s) that remain in the candidate set at the end of the protocol correspond to the maximum value.
Algorithm 2 presents the secure version of the ipMAX protocol, and Table 4 provides an illustrative example for clarity. The protocol takes the encrypted dataset { E ( d i ) B } i [ n ] as input and outputs an auxiliary dataset { E ( C i ) } i [ n ] , where the maximum value is the d i associated with C i = 1 . In other words, the data point(s) that remain in the candidate set at the end of the protocol—namely, those with C i = 1 —are identified as the final maximum value(s). Here, n is the number of input data points, l is the bit length of each data point, and  θ is the bit length of the number n.
Algorithm 2 Secure version of the ipMAX protocol.
Input: { E ( d i ) B } i [ n ] , where E ( d i ) B = { E ( d i , l 1 ) , , E ( d i , 1 ) , E ( d i , 0 ) } , d i = j = 0 l 1 d i , j · 2 j and d i , j { 0 , 1 }
Output: { E ( C i ) } i [ n ] , where C i = 1 if the corresponding data d i is the maximum, and otherwise, C i = 0 .
DH:
1:
E ( C i ) E ( 1 )
( i [ n ] )
2:
for  j = l 1 to 0 do
3:
    E ( α i ) E ( C i ) E ( r i )
( r i R Z N , i [ n ] )
4:
    E ( β i ) E ( d i , j ) E ( t i )
( t i R Z N , i [ n ] )
5:
    E ( x i ) E ( C i ) N t i E ( d i , j ) N r i E ( N r i t i )
( i [ n ] )
6:
    E ( λ ) E ( ρ ) i = 1 n E ( x i )
( ρ R Z N )
7:
    D H C S P : { E ( α i ) } i [ n ] , { E ( β i ) } i [ n ] , E ( λ )
CSP:
8:
   Decrypt E ( α i ) , E ( β i ) , E ( λ )
( i [ n ] )
9:
    w i α i · β i
( i [ n ] )
10:
    γ λ + i = 1 n w i
11:
   Encrypt γ k
( k θ 1 )
12:
    C S P D H : { E ( γ k ) } k θ 1
DH & CSP:
13:
    ( E ( q ) , ) F S E Q ( { E ( γ k ) , E ( ρ k ) } k θ 1 , S K )
DH:
14:
    E ( δ ) E ( q ) E ( τ )
( τ R Z N )
15:
    D H C S P : E ( δ )
CSP:
16:
   Decrypt E ( δ )
17:
    y i α i · δ + w i
( i [ n ] )
18:
   Encrypt y i
( i [ n ] )
19:
    C S P D H : { E ( y i ) } i [ n ]
DH:
20:
    E ( z i ) E ( C i ) N τ E ( q ) N r i E ( N r i τ )
( i [ n ] )
21:
    E ( C i ) E ( x i ) E ( y i ) E ( z i )
( i [ n ] )
22:
end for
23:
return  { E ( C i ) } i [ n ]
For brevity, we describe the data in an intuitive, non-encrypted form. At the beginning of the protocol, DH initializes all data points as the maximum candidates (line 1). As explained above, DH and CSP then jointly find the maximum value by updating the auxiliary data C i for one bit of each input data point in every iteration, proceeding from the ( l 1 )-th bit down to the 0th bit (lines 2–22). We assume that, in the ( l j )-th iteration, DH and CSP update the auxiliary data C i using the j-th bit of d i , for each i [ n ] and j = l 1 , , 0 .
  • (Step 1: lines 3–12) privately computing the number of predicted maximum data: DH first generates α i and β i , corresponding to C i and d i , j , as follows (lines 3–4).
E ( α i ) E ( C i + r i )
E ( β i ) E ( d i , j + t i ) ( r i , t i R Z N , i [ n ] )
It computes x i and then aggregates all x i along with a random value ρ to compute λ as follows (lines 5–6).
E ( x i ) E ( t i C i r i d i , j r i t i ) ( i [ n ] )
E ( λ ) E ( ρ + i = 1 n x i ) ( ρ R Z N )
It sends all α i , β i ( i [ n ] ), and  λ to a CSP (line 7). Upon receiving the ciphertexts, the CSP decrypts them (line 8), computes w i (line 9), and aggregates all w i values with λ to compute γ (line 10) as follows. Here, γ = s + ρ where s is the number of predicted maximum data and is blinded by a random value ρ . The correctness of γ will be shown later.
w i α i · β i ( i [ n ] )
γ λ + i = 1 n w i
It encrypts γ bitwise by the θ bits of the bit length of n and sends the bitwise encrypted data to the DH (lines 11–12). The DH reuses the random value ρ in Step 2 and the computed data x i in Step 3, while the CSP reuses α i and w i in Step 3.
  • (Step 2: line 13) privately comparing the number of predicted maximum data to zero: The DH and CSP privately compute q, which indicates whether γ (= s + ρ ) is equal to ρ , which is equivalent to checking whether s = 0 . To do this, the protocol uses the SEQ functionality introduced in Section 2.5. That is, DH and CSP interact with a third party that is assumed to implement the SEQ functionality ideally.
( E ( q ) , ) F S E Q ( { E ( γ k ) , E ( ρ k ) } k θ 1 , S K ) where E ( q ) = E ( 1 ) , if γ k = ρ k for k E ( 0 ) , otherwise
  • (Step 3: lines 14–21) privately updating the auxiliary data C i : The DH computes δ to blind q as follows and sends it to the CSP (lines 14–15).
E ( δ ) E ( q + τ ) ( τ R Z N )
The CSP decrypts the received data δ (line 16) and computes y i using α i and w i computed in Step 1, as follows (line 17). The CSP encrypts all y i and sends them back to the DH (lines 18–19).
y i α i · δ + w i ( i [ n ] )
The DH computes z i (line 20) and the auxiliary data C i using the data x i from Step 1 and the data y i received from the CSP as follows (line 21). The correctness of C i will be shown later.
E ( z i ) E ( τ C i r i q r i τ ) ( i [ n ] )
E ( C i ) E ( x i + y i + z i ) ( i [ n ] )
  • Correctness: We now verify that, in Step 1, γ = s + ρ , where s = C i d i , j denotes the number of predicted maximum data. Indeed, C i d i , j = 1 when the corresponding data point is a predicted maximum data, and  C i d i , j = 0 otherwise.
γ = λ + w i ( line 10 ) = ρ + x i + α i β i ( lines 6 , 9 ) = ρ + { ( C i + r i ) ( d i , j + t i ) ( t i C i + r i d i , j + r i t i ) } ( lines 3 5 ) = ρ + { ( C i d i , j + t i C i + r i d i , j + r i t i ) ( t i C i + r i d i , j + r i t i ) } = ρ + ( C i d i , j ) = s + ρ
We also show that the auxiliary data C i = x i + y i + z i of line 21 corresponds to C i = C i q + C i d i , j described in Step 3 of the intuitive idea.
C i = x i + y i + z i ( line 21 ) = ( α i δ + w i ) ( τ C i + r i q + r i τ ) ( t i C i + r i d i , j + r i t i ) ( lines 5 , 17 , 20 ) = { ( C i + r i ) ( q + τ ) + ( C i + r i ) ( d i , j + t i ) } ( τ C i + r i q + r i τ ) ( t i C i + r i d i , j + r i t i ) ( lines 3 , 4 , 9 , 14 ) = { ( C i q + τ C i + r i q + r i τ ) ( τ C i + r i q + r i τ ) } + { ( C i d i , j + t i C i + r i d i , j + r i t i ) ( t i C i + r i d i , j + r i t i ) } = C i q + C i d i , j
Efficiency Evaluation: The ipMAX protocol requires 3 l communication rounds, and its communication-volume complexity is O ( 3 n l C ) , where n denotes the number of input data points, l is the bit length of each data value, and C represents the size of a ciphertext. Note that the round cost, which directly impacts the execution time, is independent of the dataset size n. Compared to the prior protocol [20], the proposed protocol reduces the round cost by 25% (from 4 l to 3 l ), and the complexity of communication volume by approximately 50% (from O ( 6 n l C ) to O ( 3 n l C ) ). In addition, the complexity of the encryption/decryption operations is reduced by approximately 42%. While the prior protocol requires O ( 12 n l ) encryption/decryption operations, the proposed protocol reduces this to O ( 7 n l ) . Table 5 shows the communication cost (communication rounds and volume) and the computation cost (number of encryption/decryption and exponentiation operations) executed by DH and CSP at each step.
We analyze the communication and computation costs of the ipMAX protocol with respect to the number n of input data points, which is typically the largest parameter in large-scale data applications. Each step of the protocol requires one communication round. In Step 1, the DH sends 2 n ciphertexts, while in Step 3, the CSP returns n ciphertexts. In terms of computation, the DH performs 3 n encryption operations in Step 1 to encrypt random values, but these operations can be eliminated by precomputation. Similarly, the DH performs additional n encryption operations for random values in Step 3, which can also be removed by precomputation. Moreover, when executing the protocol in parallel, the encryption/decryption and exponentiation operations can be performed concurrently (i.e., parallel execution). The SEQ protocol in Step 2 is executed only once per iteration, regardless of the number of input data points. These steps are repeated l times that corresponds to the bit length of the data. The detailed communication and computation costs are provided in Table 5.
Table 6 and Table 7, respectively, present the sizes of the input/output and the minimum memory requirements of DH and CSP for the number of input data points and the degree of parallelism. Since each input data point d i is encrypted as encrypted bit-decomposed data E ( d i ) B , a single input data point requires l C bits. After the protocol terminates, the result for one input data point is included in a single ciphertext E ( C i ) . For an encrypted one-bit input ciphertext E ( d i , j ) , in Step 1 the DH generates E ( C i ) , E ( α i ) , E ( β i ) , E ( x i ) , and  E ( λ ) ; hence DH requires 5 C bits of memory. The CSP requires 3 C bits because it receives E ( α i ) , E ( β i ) , and  E ( λ ) from DH. Steps 2 and 3 do not require more memory than Step 1 (This statement assumes that the SEQ protocol in [30] is used in Step 2. In Step 2, the required memory depends on the specific SEQ protocol used). For n input data points, the overall data size scales approximately linearly with n. In particular, since E ( λ ) in Step 1 is computed only once for all input data points, DH requires ( 4 n + 1 ) C bits, and CSP requires ( 2 n + 1 ) C bits because it receives a single E ( λ ) from DH. Detailed data size and memory requirements are provided in Table 6 and Table 7. We now evaluate the per-server memory usage when the computation is parallelized across m parallel servers, assuming that m divides n. In this setting, the n input data points are partitioned across the m servers, so that each server processes n m data points. In addition, extra communication is required to compute the global aggregated γ across all m servers (line 10) and to distribute the resulting δ back to each server (line 16), which incurs an additional transmission overhead of approximately 1 2 m C for each operation.
  • Security: We now formally prove the security of the secure version of the ipMAX protocol. Let π i p M A X S E C denote the secure version of the ipMAX protocol described in Algorithm 2. We then prove the following theorem.
Theorem 1.
π i p M A X S E C privately computes F i p M A X in the presence of a semi-honest adversary.
We first show that the CSP learns no information during the protocol execution because it observes only ciphertexts and plaintexts blinded by random values. We then show that the DH learns no information because it observes only ciphertexts.
Proof of Theorem 1.
According to Algorithm 2, the execution image of the CSP is as follows:
Π C S P ( π i p M A X S E C ) = { { E ( α i ) , C i + r i } i [ n ] , { E ( β i ) , d i , j + t i } i [ n ] , { E ( λ ) , ρ + i x i } , { E ( δ ) , q + τ } }
Here, E ( α i ) , E ( β i ) , E ( λ ) , and  E ( δ ) are ciphertexts received from the DH for i [ n ] , and  C i + r i , d i , j + t i , ρ + i x i , and  q + τ are the corresponding decrypted data, respectively. Since r i is a random value chosen from Z N , the value C i + r i is randomized. Similarly, since t i , ρ , and  τ are random values in Z N , the values d i , j + t i , ρ + i x i , and  q + τ are also randomized. The simulated image of the CSP can be constructed as follows:
Π C S P S ( π i p M A X S E C ) = { { α i , r i } i [ n ] , { β i , t i } i [ n ] , { λ , ρ } , { δ , τ } }
In the simulated image, α i , β i , λ , and  δ for i [ n ] are values chosen uniformly at random from Z N 2 . Since E ( · ) is a semantically secure encryption scheme and the encrypted value is in Z N 2 , the ciphertexts E ( α i ) , E ( β i ) , E ( λ ) , and  E ( δ ) are computationally indistinguishable from random values α i , β i , λ , and  δ , respectively. Likewise, r i , t i , ρ , and  τ are chosen uniformly at random from Z N . Hence, these simulated random values are computationally indistinguishable from C i + r i , d i , j + t i , ρ + i x i , and  q + τ . Combining these analysis, we can conclude that the CSP obtains no information about C i , d i , j , i x i , and q in the secure version of ipMAX. On the other hand, the execution image of the DH is as follows:
Π D H ( π i p M A X S E C ) = { { E ( γ k ) } k θ 1 , E ( q ) , { E ( y i ) } i [ n ] }
Here, E ( γ k ) and E ( y i ) are ciphertexts received from the CSP for k θ 1 and i [ n ] , and  E ( q ) is a ciphertext received from the third party that executes F S E Q . A simulated image of the DH can be constructed as follows:
Π D H S ( π i p M A X S E C ) = { { γ k } k θ 1 , q , { y i } i [ n ] }
For the simulated image, γ k , q , and  y i are independently sampled uniformly from Z N 2 for k θ 1 and i [ n ] . By the semantic security of E ( · ) , and since the ciphertext is in Z N 2 , the ciphertexts E ( γ k ) , E ( q ) , and  E ( y i ) cannot be distinguished computationally from the random samples γ k , q , and  y i , respectively. It follows that, under the secure version of ipMAX, DH gains no information regarding γ k , q, and  y i . Accordingly, the above simulation-based argument shows that the proposed ipMAX protocol is secure in the semi-honest model. From an intuitive perspective, under the dual non-colluding cloud server setting, DH sees only encrypted values, whereas CSP observes ciphertexts, together with plaintext values masked by randomness. Accordingly, as long as DH and CSP remain non-colluding, the secure version of ipMAX reveals no information about either the input data or the computation results in the semi-honest setting.    □
The ipMAX protocol also provides protection against attacks based on data access patterns. Informally, it either applies the same computations independently to each data point, regardless of the intermediate outcomes, or operates on a single aggregated value obtained from all data points. Concretely, the values x i , y i , and  z i are computed by applying the same equations in every case, regardless of the results of the intermediate computations (lines 3–5 and 17–21). The remaining operations (lines 6–16) are applied only to a single value obtained by aggregating all data points. Accordingly, because neither DH nor CSP can infer any information about the input data or the computation outcomes, the protocol is resilient to the attacks that exploit data access patterns.

3.1.2. Secure Version of the ipMIN Protocol

The secure variant of the ipMIN protocol determines the minimum value in the input dataset while preserving the privacy of both the input data and the computation results. Following the same approach as in the prior work [19], the ipMIN algorithm is derived by running the ipMAX algorithm on the 1’s complement of the input data. That is, the ipMAX protocol can be used to compute the minimum by replacing each data value d i with its 1’s complement d i ¯ . Accordingly, the ipMIN protocol can be formally expressed as follows:
F i p M I N ( { E ( d i ) B } i [ n ] , S K ) = F i p M A X ( { E ( d i ¯ ) B } i [ n ] , S K )
In summary, the ipMIN protocol is realized by executing the ipMAX protocol on the 1’s complement of the input dataset, whereas the original ipMAX protocol is executed on the unmodified input data. As a result, the efficiency and security properties of ipMIN are identical to those of ipMAX, since the two protocols share the same structure and sequence of operations. For clarity, an illustrative example of the ipMIN protocol is provided in Table 8.

3.2. Efficient Version of the ipMAX/ipMIN Protocols

The efficient version of the ipMAX/ipMIN protocols improves upon the efficiency of the secure version. Consistent with [19], the proposed protocols also provide an efficient version by adding the specific step. As described above, the proposed protocols find the maximum/minimum value through iterative updates of the auxiliary data C i , processing one bit per iteration from the most significant bit to the least significant bit. The secure version consistently performs all l iterations, regardless of the input dataset to ensure that no information about the input or output is leaked. However, the efficient version checks if the maximum/minimum value is determined at each iteration, and if the maximum/minimum value is determined, it terminates immediately, thereby reducing the number of iterations and improving efficiency.
The efficient version is designed to terminate as soon as only one data point remains in the candidate set. As described in the previous section, the protocols initially include all input data points in the candidate set and, at each iteration, eliminate the unpredicted maximum/minimum candidates. Consequently, once the candidate set is reduced to a single remaining data point, the final maximum/minimum value has been determined, and the protocol can stop early. Because the auxiliary data C i specifies whether the corresponding data point remains in the candidate set ( C i = 1 ) or has been removed ( C i = 0 ), the number of remaining candidates can be obtained by summing all C i values.
It should be noted, however, that the efficient version does not always provide a substantial performance advantage. In particular, when multiple identical maximum/minimum values exist in the input dataset, or when the distinguishing bit appears only at the least significant position, the runtime of the efficient version converges to that of the secure version. Thus, the efficient version yields a benefit only when the maximum/minimum value is determined before the l-th round, allowing the protocol to terminate early; if the value is determined only in the final round, its efficiency is similar to that of the secure version.
The efficient version also introduces limited information leakage through its termination bit, which reveals the bit position at which the maximum/minimum value is determined. This leakage means an upper bound on the difference between the maximum/minimum value and the second-largest/-smallest value, rather than directly revealing the magnitude of the maximum/minimum itself (If ipMAX terminates at the ( l 1 )-th bit, an adversary can infer that the maximum value exceeds 2 l 1 ). Importantly, this does not reveal which data point is the maximum/minimum, nor the second-largest/-smallest value. More specifically, if ipMAX/ipMIN performs the checking step at every bit position and terminates at the t-th bit ( 0 t < l ), then an adversary can infer that the gap between the maximum/minimum value and the second-largest/-smallest value is at most 2 t + 1 1 .
This leakage can be reduced by enlarging the bound on that difference. To improve privacy, the checking step need not be performed at every bit position; instead, it may be restricted to a selected subset of bits. More frequent checking improves efficiency but increases leakage, whereas less frequent checking offers stronger privacy at the cost of reduced efficiency. Hence, the proposed protocols provide a multi-level trade-off between security and efficiency, and the check bit set should be chosen according to the privacy and performance requirements of the target application. In addition, to further reduce result leakage, the check bit set should avoid including the first few most significant bits and the last few least significant bits. Further discussion can be found in [19].

3.2.1. Efficient Version of the ipMAX Protocol

Algorithm 3 describes the efficient variant of the ipMAX protocol, and Table 9 provides an illustrative example to facilitate understanding. In addition to the encrypted input dataset { E ( d i ) B } i [ n ] , this variant takes a public parameter called the checking bit set V. For each v V , the protocol performs a check step at bit position v (at the ( l v )-th iteration) to determine whether exactly one candidate data point remains. The output is identical to that of the secure version: an auxiliary dataset { E ( C i ) } i [ n ] , where the data point d i associated with C i = 1 is identified as the final maximum value. Recall that n is the number of input data points, l is the bit length of the data, and  θ is the bit length of the number n.
We only explain the checking step (lines 22–36) that is different from that of the secure version (Algorithm 2).
  • (Checking step: lines 22–36) privately comparing the number of candidate data points to one: First, the DH checks whether the current bit (iteration) corresponds to a checking bit in the set V (line 22). If it is in the checking bit set, the DH proceeds with the following computations. The DH computes the value ϵ and sends it to the CSP (lines 23–25).
Algorithm 3 Efficient version of the ipMAX Protocol.
Input:
{ E ( d i ) B } i [ n ] where E ( d i ) B = { E ( d i , l 1 ) , , E ( d i , 1 ) , E ( d i , 0 ) } , d i = j = 0 l 1 d i , j · 2 j and d i , j { 0 , 1 }
  • Checking bit set V = { v | DH and CSP check whether the number of candidate data points is one in the v-th bit (i.e., ( l v )-th iteration), 0 v < l }
Output:
{ E ( C i ) } i [ n ] where C i = 1 if the corresponding data d i is the maximum and otherwise, C i = 0 .
DH:
1:
E ( C i ) E ( 1 )
( i [ n ] )
2:
for  j = l 1 to 0 do
3:
    E ( α i ) E ( C i ) E ( r i )
( r i R Z N , i [ n ] )
4:
    E ( β i ) E ( d i , j ) E ( t i )
( t i R Z N , i [ n ] )
5:
    E ( x i ) E ( C i ) N t i E ( d i , j ) N r i E ( N r i t i )
( i [ n ] )
6:
    E ( λ ) E ( ρ ) i = 1 n E ( x i )
( ρ R Z N )
7:
    D H C S P : { E ( α i ) } i [ n ] , { E ( β i ) } i [ n ] , E ( λ )
CSP:
8:
   Decrypt E ( α i ) , E ( β i ) , E ( λ )
( i [ n ] )
9:
    w i α i · β i
( i [ n ] )
10:
    γ λ + i = 1 n w i
11:
   Encrypt γ k
( k θ 1 )
12:
    C S P D H : { E ( γ k ) } k θ 1
DH & CSP:
13:
    ( E ( q ) , ) F S E Q ( { E ( γ k ) , E ( ρ k ) } k θ 1 , S K )
DH:
14:
    E ( δ ) E ( q ) E ( τ )
( τ R Z N )
15:
    D H C S P : E ( δ )
CSP:
16:
   Decrypt E ( δ )
17:
    y i α i · δ + w i
( i [ n ] )
18:
   Encrypt y i
( i [ n ] )
19:
    C S P D H : { E ( y i ) } i [ n ]
DH:
20:
    E ( z i ) E ( C i ) N τ E ( q ) N r i E ( N r i τ )
( i [ n ] )
21:
    E ( C i ) E ( x i ) E ( y i ) E ( z i )
( i [ n ] )
22:
   if  j V  then
23:
      E ( p ) E ( N 1 ) i = 1 n E ( C i )
24:
      E ( ϵ ) E ( p ) η
( η 0 R Z N )
25:
      D H C S P : E ( ϵ )
CSP:
26:
     Decrypt E ( ϵ )
27:
     if  ϵ = = 0  then
28:
         ϕ = 1
29:
     else
30:
         ϕ = 0
31:
     end if
32:
      C S P D H : ϕ
DH:
33:
     if  ϕ = = 1  then
34:
         return  { E ( C i ) } i [ n ]
35:
     end if
36:
   end if
37:
end for
38:
return  { E ( C i ) } i [ n ]
E ( p ) E ( 1 + i = 1 n C i )
E ( ϵ ) E ( η p ) ( η 0 R Z n )
Once the CSP receives ϵ , it decrypts the ciphertext (line 26) and checks whether the resulting value is zero. If the decrypted value ϵ = 0 , the CSP sets ϕ = 1 ; otherwise, if ϵ 0 (i.e., a random value), the CSP sets ϕ = 0 and returns this bit to DH in plaintext (i.e., without encryption) (lines 27–32). When only one candidate data point remains, the maximum value has already been determined, which implies p = 0 and, therefore, yields ϕ = 1 . In this case, DH returns the auxiliary dataset { E ( C i ) } i [ n ] as the output and halts the protocol (lines 33–35). By contrast, if ϕ = 0 , then p 0 , meaning that the maximum value has not yet been determined, and the protocol proceeds to the next iteration. We next analyze the efficiency of the efficient version. The security proof is omitted here, since it follows by essentially the same argument as that used for the secure version.
  • Efficiency Evaluation: Because the efficient version can stop before reaching the l-th iteration when the maximum value is found early, it executes all l iterations only in the worst case. By contrast, the secure version always runs for exactly l iterations, regardless of the intermediate outcomes, where l denotes the bit length of the data. Consequently, the efficient version requires at most 4 l communication rounds, and its communication-volume complexity is bounded by O ( 3 n l C ) , including the communication cost of the SEQ protocol. Note that the round cost, which directly affects the execution time, is independent of the number n of input data points, as in the secure version. Table 10 shows the communication costs (communication rounds and volume) and the computation costs (number of encryption/decryption and exponentiation operations) performed by the DH and CSP at each step of the protocol.
The checking step requires one round of communication: the DH sends value ϵ to the CSP, which responds with 1-bit value ϕ . From a computational point of view, the DH performs one exponentiation operation and the CSP performs one decryption operation. Since the multiplication operations in line 23 are performed locally and have a negligible impact on performance as mentioned in Section 2.4, they are excluded from the efficiency evaluation.

3.2.2. Efficient Version of the ipMIN Protocol

The efficient version of the ipMIN protocol is derived from the same underlying idea as the efficient version of ipMAX. In particular, it terminates once the minimum-candidate set has been reduced to a single data point, and its checking step is identical to that of the efficient variant of the ipMAX protocol described above. As in the secure version, the ipMIN protocol is realized by running the corresponding ipMAX algorithm on the 1’s complement of the input dataset. Accordingly, the efficient version of the ipMIN protocol is obtained by running the efficient version of ipMAX on the 1’s complement of the input data, whereas running the efficient version of ipMAX protocol on the original dataset yields the maximum value. Because the efficient version of ipMIN shares the same efficiency and security characteristics as the efficient version of ipMAX, a separate detailed discussion is unnecessary.

4. Experimental Results

This section presents the experimental environment and dataset, followed by the experimental results for various parameters. In order to demonstrate the efficiency of the proposed protocols, we implemented the ipMAX protocol in C++ using Paillier cryptosystem as the underlying additively homomorphic encryption scheme and its library [32]. The machine used for the DH was equipped with an Intel Core i9-14900K CPU @ 1.4 GHz (24 cores) and 16 GB of RAM, while that for the CSP was equipped with an Intel Core i5-14400 @ 2.50 GHz (10 cores) and 8 GB of RAM. Both the DH and CSP systems ran Ubuntu 18.04 LTS. The two machines were physically separated and communicated over a 1 Gbps network connection.
The experimental dataset was “Facebook Live Sellers in Thailand” from the UCI machine learning repository, which contains 7050 data points (n = 7050) with 16 bits (l = 16) [33]. We measured the communication volume and the execution time for the secure and efficient versions. We then compared these results to those of the prior protocols [20]. We first ran the protocols using a 1024-bit key and 16 threads for parallel execution. Table 11 shows the experimental results along with a comparative analysis.
As shown in Table 11, the secure version reduces communication volume and execution time by 50% and 22%, respectively, compared to the prior protocol. In the experimental dataset, the maximum value is determined at the 7th iteration (i.e., the 9th bit of 16 bits), whereas the secure version consistently performs all 16 iterations. Due to this early termination, the efficient version reduces communication volume and execution time by 78% and 64%, respectively. Note that the efficiency gain results from revealing the bit position at which the maximum/minimum value is determined. To carry out a comprehensive experimental evaluation and performance analysis, we varied several key parameters, including the number of threads, the input size, the key size, and the bit length of the data. A separate evaluation of ipMIN is omitted because it has identical runtime and communication characteristics to ipMAX.
Figure 2a illustrates the reduction in execution time as the number of parallel threads increases. As mentioned in the previous section, the proposed protocols support the parallel computation of individual data, and therefore, doubling the number of threads halves the execution time. For example, in the secure version (solid red line), the execution times are 1026, 550, and 304 s for two, four, and eight threads, respectively. A similar pattern is observed in the efficient version (dashed green line). The prior protocol [20] takes an average of 30% longer than the secure version (dotted blue line). As the number of threads increases, the execution time initially decreases; however, once the degree of parallelism exceeds a certain point, the improvement diminishes, and the performance eventually reaches saturation. This pattern is commonly observed in practical deployments. The following model captures the practical limitation of execution-time scaling with increased parallelism:
T = N R T R T + V C B W + T C N P + T O
In the above equation, T is the total execution time, N R is the number of communication/synchronization rounds, T R T is the round-trip time between DH and CSP, and V C is the communication volume (total amount of data transmitted during execution).
B W is the effective bandwidth, T C is the total computation time without parallelization, N P is the number of parallel threads, and T O is the time spent on parallel/coordination overhead. When the number of parallel threads, N P , is small, the overall execution time T is dominated by the term T C N P . In this setting, the round-trip time remains small, the reduction in the effective bandwidth is limited, and the additional overhead is relatively small; thus, these factors have little impact on the total runtime. Consequently, increasing N P reduces the execution time rapidly. However, beyond a certain level of parallelism, the decrease in T C N P becomes small, while the other terms become more significant: the effective round-trip time can increase, the effective bandwidth can decrease, and additional system overheads arise, increasing T O . As a result, once N P exceeds a threshold, the runtime benefit from reducing T C N P is outweighed by the runtime penalties from T R T , B W , and T O , leading to saturation, in which increasing parallelism no longer reduces the overall execution time.
Figure 2b shows that the communication volume remains constant, regardless of the number of threads. In the figure, the solid color bars indicate the communication volume of the DH, and the hatched bars represent that of the CSP. Even if the number of threads increases, the protocol allocates the same computational tasks to the threads, and therefore, the overall amount of communication remains unchanged. Compared to the prior protocol, the communication volumes are reduced by 50% and 78% for the secure and efficient versions, respectively. For each protocol/version, the communication volume of the CSP accounts for approximately 50% of that of the DH.
Figure 2c,d show that both the execution time and communication volume increase linearly and consistently as the amount of input data grows. This is because an increase in input data leads to a proportional increase in the amount of data to be processed. For the secure version, an additional 1000 data points results in an increase of approximately 23 s in execution time and an additional 7.81 MB and 3.91 MB of data transmission by the DH and CSP, respectively. For the efficient version, an additional 1000 data points results in an increase of around 11 s in execution time and an additional 3.42 MB and 1.71 MB of data transmission by the DH and CSP, respectively.
Figure 2e,f show that both the execution time and communication volume increase as the key size increases. This is because the ciphertext size of the Paillier cryptosystem is twice the modulus size (i.e., the key size). For the proposed protocol, the execution time approximately doubles when the key size increases from 512 to 1024 bits. When the key size increases from 1024 to 2048 bits, the execution time increases by more than three times. Compared to the prior protocol, the secure and the efficient versions are reduced by around 50% and 77%, respectively, when using a 2048-bit key. In terms of communication volume, both the secure and efficient versions increase constantly and twice as much as the key size doubles. This is because the ciphertext size is double the key size and determines the communication volume.
Figure 2g,h show the execution time and communication volume for the bit length of the input data. Various data with different bit-lengths are generated by truncating or extending the least significant bits of the original data using bit-shift operations. As shown in the figures, the execution time and communication volume of the secure version increase linearly with the data length since the secure version performs one iteration per bit. For every 8-bit increase, an additional 80 s is required, and the DH and CSP transmit an additional 27.6 MB and 13.8 MB, respectively. However, the efficient version maintains a constant execution time and communication volume, regardless of the data length. This is because, in our experimental dataset, the maximum value is determined in the seventh iteration, causing the protocol to terminate early. In other words, even when the data length increases from 8 to 32 bits, the efficient version constantly terminates at the seventh iteration, resulting in no additional cost.
As demonstrated in Figure 2a, the overall performance of the proposed protocols is proportional to the number of parallel threads since the computations on each data point are independent and can be executed in parallel. Therefore, it is expected that the protocol will significantly improve runtime performance when deployed in a cloud computing environment that supports a large number of parallel executions.

5. Related Works

Most existing ppMAX/ppMIN protocols relied on costly comparison operations, which causes the execution time to increase rapidly as the amount of input data grows. Therefore, these protocols are unsuitable for large-scale data analytics. To overcome this inefficiency, some protocols provide approximate results or allow partial data leakage. In addition, other approaches, such as those based on quantum cryptography, are difficult to implement in practice and only provide simulation-based results without concrete results.
The authors of [18] focused on privacy-preserving reinforcement learning for patient treatment. They introduced cryptographic protocols that enable the secure computations, such as comparison, maximum selection, exponentiation, and division on encrypted data. Their protocol computes on encrypted health data without revealing sensitive information and is based on the CKKS approximate homomorphic encryption scheme [22] by Cheon et al. However, the results may contain small numerical errors since the protocol is based on an approximate scheme. Moreover, their maximum protocol performs costly comparison operations for each data point, making it inefficient and unsuitable for large-scale data applications. The authors of [21] tackled the problem of finding the maximum value and its index in data encrypted in the CKKS scheme [22]. Due to the difficulty of emulating comparison operations with polynomials using fully homomorphic encryption, the authors introduced a protocol that finds all data whose top k most significant bits match those of the true maximum. However, the protocol involves the limitation that it outputs only an approximate maximum value, and the accuracy of the result depends on the parameter k. Furthermore, it performs expensive comparison operations for each data point, making it inefficient and unsuitable for large-scale data applications.
The authors of [8] proposed a lightweight protocol for the Internet of Things (IoT) that enables maximum/minimum range queries (i.e., finding the maximum or minimum value within a dataset) without revealing sensitive information. Rather than using heavy cryptography, the protocol uses a probabilistic approach based on simple hashing and bitwise XOR operations. However, the protocol reveals the maximum value of the transmitted data to intermediate nodes (aggregators). Due to the probabilistic nature of the method, the results may contain errors. The authors of [9] proposed a sealed-bid auction that finds the maximum bid without exposing the bids of each bidder in an IoT environment. The protocol supports multiple authorities to overcome the limitations of a single authority (e.g., an excessive operational burden). It runs on simple operations since it is based on the protocol proposed in [8]. However, due to the protocol of [8], it inherits the same limitations as the previous work. In [34], the authors revisited the existing protocols [35,36] for securely computing the minimum and k-th minimum in privacy-sensitive mobile sensing applications and proposed improvements to enhance efficiency and accuracy. Their compressing min computation protocol (CMCP) reduces the total runtime by eliminating or parallelizing certain steps. However, since the protocol is probabilistic, it may produce approximate results with potential errors, similar to prior works [8,9,18,21]. In addition, the evaluation is limited to simulations, and the authors did not provide concrete experimental results to demonstrate real-world performance.
The authors of [37] explored a quantum cryptographic solution to the multi-party maximum problem. First, they defined a basic secure multi-party computation of the logical OR operation. Then, they implemented the computation using single photons (quantum states) to achieve information-theoretic security. However, the approach is theoretical and requires a quantum communication infrastructure, which is impractical for large networks. A simulation was conducted on a small scale, and scaling up to include more parties or larger bit lengths may present challenges related to quantum noise and error correction. The authors of [38] proposed a quantum cryptography-based approach to address privacy-preserving range maximum/minimum queries in edge-based Internet of Things (IoT) environments. They proposed a novel quantum protocol for range queries using an orthogonal social interaction distinguishability (OSID) quantum method. This scheme allows an edge IoT server to process encrypted quantum states from devices and determine the maximum sensor within a certain range without learning any individual sensor data. However, as with the method in [37], this protocol is highly dependent on quantum hardware, which poses practical limitations to real-world deployment. Thus, further advancements in quantum technologies are required before this approach is considered feasible.
The work in [19] introduced highly efficient privacy-preserving protocols, based on multiparty computation, for identifying the maximum/minimum value in large-scale datasets. The authors formally demonstrated that those protocols preserve the privacy of the input data, the computation outcomes, and the associated data-access patterns. Their protocols also include secure and efficient variants, such as the protocols proposed here. Nevertheless, compared with our protocols, their construction incurs approximately twice the communication and computation cost. Table 12 compares and summarizes the aforementioned related works.

6. Conclusions

In big data analytics, data is a key factor that determines the accuracy and efficiency of models. This directly impacts the quality of the analysis results and learning performance. Outsourcing data to external parties can lead to the leakage of personal information or the loss of corporate intellectual property, which can result in severe privacy violations and economic or legal consequences. Therefore, it is important to study large-scale data analytics protocols that ensure privacy protection and maintain computational efficiency.
In this paper, we have proposed improved privacy-preserving protocols (ipMAX/ipMIN protocols) for securely finding the maximum or minimum value over encrypted large-scale data in outsourced cloud computing environments. The proposed protocols consist of a secure version specialized for security and an efficient version specialized for efficiency. Compared to the prior protocol [20], the secure version reduces the number of communication rounds by 25%, the communication volume by 50%, and the computational cost by 42%, while preserving privacy and accuracy. The efficient version can provide more efficient results. These efficiency improvements are achieved by integrating independent subprotocols, reusing intermediate computational results, and eliminating computations that scale with the number of input data points. While existing protocols perform costly comparison operations in proportion to the size of the input data, the proposed protocols perform more efficient equality operations in proportion to the length of the data. This property substantially increases the efficiency of the proposed protocols since data length is significantly smaller than the amount of the data in large-scale data. Our experiments demonstrated the theoretical analysis: the communication volume was reduced by half, and the execution time by 22%, compared to the prior protocol. Moreover, since the proposed protocols support fully parallel execution, it is expected that the performance will be significantly improved in cloud environments with massive parallel computing capabilities. The protocols enable even data owners with insufficient computing power for large-scale data analytics to participate without information exposure. Under the secure version, the cloud servers are unable to infer any information about either the input data or the final output. This makes the protocols well suited for efficient deployment on public cloud platforms operated by major IT providers. By contrast, the efficient version allows limited information leakage—specifically, the bit position at which the maximum/minimum value is determined—and can achieve higher efficiency by trading a small amount of privacy for improved performance.
Although privacy-preserving computations such as maximum/minimum, bit-decomposition and comparison have been proposed using multi-party computation (MPC) as the main privacy-preserving technique, they are inefficient for use in real-world environments. Because the characteristics of MPC is similar to homomorphic encryption in terms of the design of privacy-preserving protocols, the idea presented in this paper, which is the integration of independent subprotocols, will significantly contribute to enhancing the efficiency of computation protocols using MPC. In addition, future work includes extending the proposed ipMAX/ipMIN protocols to support more complex statistical queries, such as median and top-k selection, which could further improve overall performance. However, a key technical challenge will be to adapt the proposed protocols to the input/output requirements of these tasks without degrading efficiency.

Funding

This research was supported in part by the Research Grant of Jeonju University in 2024 and in part by the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education under Grant NRF-2022R1A6A3A01087466.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Memon, M.A.; Soomro, S.; Jumani, A.K.; Kartio, M.A. Big data analytics and its applications. arXiv 2017, arXiv:1710.04135. [Google Scholar] [CrossRef]
  2. Dash, S.; Shakyawar, S.K.; Sharma, M.; Kaushik, S. Big data in healthcare: Management, analysis and future prospects. J. Big Data 2019, 6, 54. [Google Scholar] [CrossRef]
  3. Berisha, B.; Mëziu, E.; Shabani, I. Big data analytics in Cloud computing: An overview. J. Cloud Comput. 2022, 11, 24. [Google Scholar] [CrossRef]
  4. Ulybyshev, D.; Bhargava, B.; Villarreal-Vasquez, M.; Alsalem, A.O.; Steiner, D.; Li, L.; Kobes, J.; Halpin, H.; Ranchal, R. Privacy-preserving data dissemination in untrusted cloud. In 2017 IEEE 10th International Conference on Cloud Computing (CLOUD); IEEE: Piscataway, NJ, USA, 2017; pp. 770–773. [Google Scholar]
  5. Kennedy, D.M. 2022 Cloud Computing TechReport; American Bar Association: Chicago, IL, USA, 2022. [Google Scholar]
  6. Acar, A.; Aksu, H.; Uluagac, A.S.; Conti, M. A survey on homomorphic encryption schemes: Theory and implementation. ACM Comput. Surv. 2018, 51, 79. [Google Scholar] [CrossRef]
  7. Dai, H.; Ji, Y.; Xiao, F.; Yang, G.; Yi, X.; Chen, L. Privacy-preserving MAX/MIN query processing for WSN-as-a-service. In 2019 IFIP Networking Conference (IFIP Networking); IEEE: Piscataway, NJ, USA, 2019; pp. 1–9. [Google Scholar]
  8. Sciancalepore, S.; Di Pietro, R. PPRQ: Privacy-preserving MAX/MIN range queries in IoT networks. IEEE Internet Things J. 2020, 8, 5075–5092. [Google Scholar] [CrossRef]
  9. Meng, Q.; Liang, Z.; Shen, Z.; Liu, Y.; Liu, Y.; Hu, J. RAMA: Robust auction scheme with multiple authorities in IoT. In 2022 21st International Symposium on Communications and Information Technologies (ISCIT); IEEE: Piscataway, NJ, USA, 2022; pp. 227–232. [Google Scholar]
  10. Samanthula, B.K.; Elmehdwi, Y.; Jiang, W. K-nearest neighbor classification over semantically secure encrypted relational data. IEEE Trans. Knowl. Data Eng. 2014, 27, 1261–1273. [Google Scholar] [CrossRef]
  11. Park, J.; Lee, D.H. Parallelly running and privacy-preserving k-nearest neighbor classification in outsourced cloud computing environments. Electronics 2022, 11, 4132. [Google Scholar] [CrossRef]
  12. Nateghizad, M.; Erkin, Z.; Lagendijk, R.L. An efficient privacy-preserving comparison protocol in smart metering systems. EURASIP J. Inf. Secur. 2016, 2016, 11. [Google Scholar] [CrossRef]
  13. Damgård, I.; Fitzi, M.; Kiltz, E.; Nielsen, J.B.; Toft, T. Unconditionally secure constant-rounds multi-party computation for equality, comparison, bits and exponentiation. In Theory of Cryptography Conference; Springer: Berlin/Heidelberg, Germany, 2006; pp. 285–304. [Google Scholar]
  14. Nishide, T.; Ohta, K. Multiparty computation for interval, equality, and comparison without bit-decomposition protocol. In International Workshop on Public Key Cryptography; Springer: Berlin/Heidelberg, Germany, 2007; pp. 343–360. [Google Scholar]
  15. Karakoç, F.; Nateghizad, M.; Erkin, Z. SET-OT: A secure equality testing protocol based on oblivious transfer. In 14th International Conference on Availability, Reliability and Security; Association for Computing Machinery: New York, NY, USA, 2019; pp. 1–9. [Google Scholar]
  16. Ben-David, A.; Nisan, N.; Pinkas, B. FairplayMP: A system for secure multi-party computation. In 15th ACM Conference on Computer and Communications Security; Association for Computing Machinery: New York, NY, USA, 2008; pp. 257–266. [Google Scholar]
  17. Dai, H.; Wang, M.; Yi, X.; Yang, G.; Bao, J. Secure max/min queries in two-tiered wireless sensor networks. IEEE Access 2017, 5, 14478–14489. [Google Scholar] [CrossRef]
  18. Sun, X.; Sun, Z.; Wang, T.; Feng, J.; Wei, J.; Hu, G. A Privacy-Preserving Reinforcement Learning Approach for Dynamic Treatment Regimes on Health Data. Wirel. Commun. Mob. Comput. 2021, 2021, 8952219. [Google Scholar] [CrossRef]
  19. Park, J. Extremely efficient and privacy-preserving max/min protocol based on multiparty computation in big data. IEEE Trans. Consum. Electron. 2024, 70, 3042–3055. [Google Scholar] [CrossRef]
  20. Park, J.; Lee, D.H. Parallelly Running and Privacy-Preserving Agglomerative Hierarchical Clustering in Outsourced Cloud Computing Environments. IEEE Trans. Big Data 2024, 11, 174–189. [Google Scholar] [CrossRef]
  21. Lee, H.; Choi, J.; Lee, Y. Approximating Max Function in Fully Homomorphic Encryption. Electronics 2023, 12, 1724. [Google Scholar] [CrossRef]
  22. Cheon, J.H.; Kim, A.; Kim, M.; Song, Y. Homomorphic encryption for arithmetic of approximate numbers. In Proceedings of the Advances in Cryptology–ASIACRYPT 2017: 23rd International Conference on the Theory and Applications of Cryptology and Information Security, Hong Kong, China, 3–7 December 2017; Proceedings, Part I 23; Springer: Cham, Switzerland, 2017; pp. 409–437. [Google Scholar]
  23. Dai, H.; Wei, T.; Huang, Y.; Xu, J.; Yang, G. Random secure comparator selection based privacy-preserving MAX/MIN query processing in two-tiered sensor networks. J. Sens. 2016, 2016, 6301404. [Google Scholar] [CrossRef]
  24. Bogetoft, P.; Christensen, D.L.; Damgård, I.; Geisler, M.; Jakobsen, T.; Krøigaard, M.; Nielsen, J.D.; Nielsen, J.B.; Nielsen, K.; Pagter, J.; et al. Secure multiparty computation goes live. In Proceedings of the Financial Cryptography and Data Security: 13th International Conference, FC 2009, Accra Beach, Barbados, 23–26 February 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 325–343. [Google Scholar]
  25. Dupin, A.; Robert, J.M.; Bidan, C. Location-proof system based on secure multi-party computations. In Provable Security: 12th International Conference, ProvSec 2018, Jeju, Republic of Korea, 25–28 October 2018, Proceedings; Springer: Cham, Switzerland, 2018; pp. 22–39. [Google Scholar]
  26. Ohata, S.; Nuida, K. Communication-efficient (client-aided) secure two-party protocols and its application. In Proceedings of the Financial Cryptography and Data Security: 24th International Conference, FC 2020, Kota Kinabalu, Malaysia, 10–14 February 2020; Springer: Cham, Switzerland, 2020; pp. 369–385. [Google Scholar]
  27. Goldwasser, S.; Micali, S.; Rackoff, C. The knowledge complexity of interactive proof-systems. In Providing Sound Foundations for Cryptography: On the Work of Shafi Goldwasser and Silvio Micali; Association for Computing Machinery: New York, NY, USA, 2019; pp. 203–225. [Google Scholar]
  28. Chor, B.; Goldwasser, S.; Micali, S.; Awerbuch, B. Verifiable secret sharing and achieving simultaneity in the presence of faults. In 26th Annual Symposium on Foundations of Computer Science (sfcs 1985); IEEE: Piscataway, NJ, USA, 1985; pp. 383–395. [Google Scholar]
  29. Goldreich, O. Foundations of Cryptography: Volume 2, Basic Applications; Cambridge University Press: Cambridge, UK, 2001; Volume 2. [Google Scholar]
  30. Nateghizad, M.; Veugen, T.; Erkin, Z.; Lagendijk, R.L. Secure equality testing protocols in the two-party setting. In 13th International Conference on Availability, Reliability and Security; Association for Computing Machinery: New York, NY, USA, 2018; pp. 1–10. [Google Scholar]
  31. Elmehdwi, Y.; Samanthula, B.K.; Jiang, W. Secure k-nearest neighbor query over encrypted data in outsourced environments. In 2014 IEEE 30th International Conference on Data Engineering; IEEE: Piscataway, NJ, USA, 2014; pp. 664–675. [Google Scholar]
  32. Bethencourt, J. Paillier Library. Available online: https://acsc.cs.utexas.edu/libpaillier/ (accessed on 5 July 2025).
  33. Dehouche, N. Facebook Live Sellers in Thailand. UCI Machine Learning Repository. 2018. Available online: https://archive.ics.uci.edu/dataset/488/facebook+live+sellers+in+thailand (accessed on 15 February 2026).
  34. Gao, J.; Zhang, Y.; Zhong, S. Revisiting privacy-preserving min and k-th min protocols for mobile sensing. IEEE Trans. Dependable Secur. Comput. 2023, 21, 3211–3226. [Google Scholar] [CrossRef]
  35. Zhang, Y.; Chen, Q.; Zhong, S. Efficient and privacy-preserving min and k th min computations in mobile sensing systems. IEEE Trans. Dependable Secur. Comput. 2015, 14, 9–21. [Google Scholar] [CrossRef]
  36. Yu, J.; Wang, K.; Zeng, D.; Zhu, C.; Guo, S. Privacy-preserving data aggregation computing in cyber-physical social systems. ACM Trans. Cyber-Phys. Syst. 2018, 3, 8. [Google Scholar] [CrossRef]
  37. Shi, R.h.; Li, Y.f. Privacy-preserving quantum protocol for finding the maximum value. EPJ Quantum Technol. 2022, 9, 13. [Google Scholar] [CrossRef]
  38. Shi, R.H.; Fang, X.Q. Quantum scheme for privacy-preserving range max/min query in edge-based internet of things. IEEE Trans. Netw. Serv. Manag. 2024, 21, 6827–6838. [Google Scholar] [CrossRef]
Figure 1. Flowcharts of the secure and efficient versions.
Figure 1. Flowcharts of the secure and efficient versions.
Applsci 16 02580 g001
Figure 2. Comparison of the execution time and the communication volume with the prior protocol [20] for the number of parallel threads, the number of input data points, the key length, and the data length. (The secure version provides a similar level of security to the prior protocol [20], whereas the efficient version provides a lower security for efficiency).
Figure 2. Comparison of the execution time and the communication volume with the prior protocol [20] for the number of parallel threads, the number of input data points, the key length, and the data length. (The secure version provides a similar level of security to the prior protocol [20], whereas the efficient version provides a lower security for efficiency).
Applsci 16 02580 g002
Table 1. Performance comparison of the existing and proposed protocols.
Table 1. Performance comparison of the existing and proposed protocols.
Number of Input Data Points (n)Execution Time (s)
[16]45.38
[24]12301800
[20]7050212
[Ours (Secure version)]7050166
Table 2. Notations.
Table 2. Notations.
NotationDescription
[ n ] The set { 1 , 2 , , n } for n Z + .
n The set { n , , 1 , 0 } = [ n ] { 0 } for n Z + .
d ¯ The 1’s complement of d with 0 d < 2 l , which is computed by toggling each bit of the value d. For example, the 1’s complement of the binary number d = 0110 0101 is d ¯ = 1001 1010 .
d j ¯ The 1’s complement of a single bit d j { 0 , 1 } , which is computed as d j ¯ = 1 d j .
{ d i } i I The dataset { d i 1 , d i 2 , , d i n } for a set I = { i 1 , i 2 , , i n } , where 0 d i < 2 l .
d B As a special case of a dataset, the bit-decomposed form of a value d for 0 d < 2 l . It is denoted by d B = { d l 1 , , d 1 , d 0 } = { d j } j l 1 where d = j = 0 l 1 d j · 2 j and d j { 0 , 1 } .
E ( d ) B The encrypted bit-decomposed data of a value d for 0 d < 2 l . It is denoted by E ( d ) B = { E ( d l 1 ) , , E ( d 1 ) , E ( d 0 ) } = { E ( d j ) } j l 1 where d = j = 0 l 1 d j · 2 j and d j { 0 , 1 } .
E ( d ¯ ) B The encrypted 1’s complement of the bit-decomposed data for a value d. It is denoted by E ( d ¯ ) B = { E ( d l 1 ¯ ) , , E ( d 1 ¯ ) , E ( d 0 ¯ ) } = { E ( d j ¯ ) } j l 1 where d = j = 0 l 1 ( 1 d j ¯ ) · 2 j and d j ¯ { 0 , 1 } .
E ( C i ) E ( 1 )
( i [ n ] )
The operations for a dataset. They are performed in parallel on each data point, but for succinctness, we simplify them. For example, the notation in the Notation column means the following operations:
                    for each i [ n ] do
                     E ( C i ) E ( 1 )
end for
The operation notation for a dataset can be appropriately modified according to the operation.
r R S For a set, S, the notation denotes that a value, r, is chosen from the set S uniformly at random.
D H C S P : E ( m ) This notation denotes that DH sends a message E ( m ) to CSP.
x · y The ordinary multiplication for two values, x and y.
E ( x ) E ( y ) The homomorphic addition of two encrypted data, E ( x ) and E ( y ) , as introduced in Section 2.3.
Table 3. Costs of SEQ protocol in [30].
Table 3. Costs of SEQ protocol in [30].
(a) Communication costs
RoundsVolume
DH & CSPDHCSP
Costs1 θ · C 1 · C
(b) Computation costs
CaseDHCSP
Enc/DecExpEnc/DecExp
Normal case0 2 θ + 1 θ + 1 0
Precomputation0 2 θ + 1 θ + 1 0
Parallel execution0320
C is the size (in bits) of a ciphertext. E n c / D e c is the number of encryption/decryption operations. E x p is the number of exponentiation operations. P a r a l l e l e x e c u t i o n means the number of sequential batches, with the assumption of sufficiently many parallel threads.
Table 4. Example of the secure version of the ipMAX Protocol (Algorithm 2).
Table 4. Example of the secure version of the ipMAX Protocol (Algorithm 2).
j { E ( d i ) B } i [ 5 ] s ρ { E ( γ k ) } k 2 E ( q ) { E ( C i ) } i [ 5 ]
······ { E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 1 ) }
7 { E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) } 03 { E ( 0 ) , E ( 1 ) , E ( 1 ) } E ( 1 ) { E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 1 ) }
6 { E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 0 ) , E ( 0 ) } 37 { E ( 0 ) , E ( 1 ) , E ( 0 ) } E ( 0 ) { E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 0 ) , E ( 0 ) }
5 { E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 1 ) , E ( 1 ) } 04 { E ( 1 ) , E ( 0 ) , E ( 0 ) } E ( 1 ) { E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 0 ) , E ( 0 ) }
4 { E ( 1 ) , E ( 1 ) , E ( 0 ) , E ( 1 ) , E ( 0 ) } 26 { E ( 0 ) , E ( 0 ) , E ( 0 ) } E ( 0 ) { E ( 1 ) , E ( 1 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) }
3 { E ( 0 ) , E ( 0 ) , E ( 1 ) , E ( 0 ) , E ( 1 ) } 01 { E ( 0 ) , E ( 0 ) , E ( 1 ) } E ( 1 ) { E ( 1 ) , E ( 1 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) }
2 { E ( 1 ) , E ( 0 ) , E ( 1 ) , E ( 1 ) , E ( 0 ) } 15 { E ( 1 ) , E ( 1 ) , E ( 0 ) } E ( 0 ) { E ( 1 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) }
1 { E ( 0 ) , E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 0 ) } 02 { E ( 0 ) , E ( 1 ) , E ( 0 ) } E ( 1 ) { E ( 1 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) }
0 { E ( 1 ) , E ( 0 ) , E ( 1 ) , E ( 0 ) , E ( 1 ) } 13 { E ( 1 ) , E ( 0 ) , E ( 0 ) } E ( 0 ) { E ( 1 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) }
In line 10, γ = s + ρ , where s = i = 1 n C i d i , j is the number of predicted maximum data, and ρ is a random value. The bit length of input data is l = 8 . The bit length of the number of the input data points is θ = 3 . - Input: { E ( d i ) B } i [ 5 ] = { E ( 85 ) B , E ( 82 ) B , E ( 79 ) B , E ( 54 ) B , E ( 41 ) B } . - Output: { E ( C i ) } i [ 5 ] = { E ( 1 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) } (indicating that the maximum value is 85).
Table 5. The communication and computation costs for a single iteration (i.e., one bit) of the secure version. These operations are repeated for a total of l iterations.
Table 5. The communication and computation costs for a single iteration (i.e., one bit) of the secure version. These operations are repeated for a total of l iterations.
(a) Communication costs
StepRoundsVolume
DHCSP
Step 11 ( 2 n + 1 ) · C θ · C
Step 21 θ · C 1 · C
Step 31 1 · C n · C
Total3 ( 2 n + θ + 2 ) · C ( n + θ + 1 ) · C
(b) Computation costs
CaseStepDHCSP
Enc/DecExpEnc/DecExp
Normal
case
Step 1 3 n + 1 2 n 2 n + θ + 1 0
Step 20 2 θ + 1 θ + 1 0
Step 3 n + 1 2 n n + 1 0
Total 4 n + 2 4 n + 2 θ + 1 3 n + 2 θ + 3 0
Precom-
putation
Step 10 2 n 2 n + θ + 1 0
Step 20 2 θ + 1 θ + 1 0
Step 30 2 n n + 1 0
Total0 4 n + 2 θ + 1 3 n + 2 θ + 3 0
Parallel
execution
Step 11120
Step 20320
Step 31120
Total2560
n is the number of input data points. l is the bit length of the data. θ is the bit length of the number n. P a r a l l e l e x e c u t i o n means the number of sequential batches, with sufficiently many parallel threads assumed.
Table 6. Size of input/output for the number of the input data points and parallel operations (assuming m divides n).
Table 6. Size of input/output for the number of the input data points and parallel operations (assuming m divides n).
Input Data (Bits)Output Data (Bits)
One input data point l C C
n input data points n l C n C
m parallel processing of the n input data points n m l C n m C
Table 7. Minimum memory size of DH and CSP for the number of the input data points and parallel operations (assuming m divides n).
Table 7. Minimum memory size of DH and CSP for the number of the input data points and parallel operations (assuming m divides n).
DH (Bits)CSP (Bits)
One input data point ( 4 + 1 ) C ( 2 + 1 ) C
n input data points ( 4 n + 1 ) C ( 2 n + 1 ) C
m parallel processing of the n input data points ( 4 n m + 1 ) C ( 2 n m + 1 ) C
Table 8. Example of secure version of the ipMIN Protocol (Algorithm 2).
Table 8. Example of secure version of the ipMIN Protocol (Algorithm 2).
j { E ( d i ) B } i [ 5 ] s ρ { E ( γ k ) } k 2 E ( q ) { E ( C i ) } i [ 5 ]
······ { E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 1 ) }
7 { E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 1 ) } 52 { E ( 1 ) , E ( 1 ) , E ( 1 ) } E ( 0 ) { E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 1 ) }
6 { E ( 0 ) , E ( 0 ) , E ( 1 ) , E ( 1 ) , E ( 1 ) } 36 { E ( 0 ) , E ( 0 ) , E ( 1 ) } E ( 0 ) { E ( 0 ) , E ( 0 ) , E ( 1 ) , E ( 1 ) , E ( 1 ) }
5 { E ( 0 ) , E ( 1 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) } 03 { E ( 0 ) , E ( 1 ) , E ( 1 ) } E ( 1 ) { E ( 0 ) , E ( 0 ) , E ( 1 ) , E ( 1 ) , E ( 1 ) }
4 { E ( 1 ) , E ( 0 ) , E ( 0 ) , E ( 1 ) , E ( 1 ) } 24 { E ( 1 ) , E ( 1 ) , E ( 0 ) } E ( 0 ) { E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 1 ) , E ( 1 ) }
3 { E ( 0 ) , E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 1 ) } 21 { E ( 0 ) , E ( 1 ) , E ( 1 ) } E ( 0 ) { E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 1 ) , E ( 1 ) }
2 { E ( 1 ) , E ( 0 ) , E ( 1 ) , E ( 0 ) , E ( 1 ) } 18 { E ( 0 ) , E ( 0 ) , E ( 1 ) } E ( 0 ) { E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 1 ) }
1 { E ( 0 ) , E ( 1 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) } 05 { E ( 1 ) , E ( 0 ) , E ( 1 ) } E ( 1 ) { E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 1 ) }
0 { E ( 1 ) , E ( 0 ) , E ( 1 ) , E ( 1 ) , E ( 0 ) } 07 { E ( 1 ) , E ( 1 ) , E ( 1 ) } E ( 1 ) { E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 1 ) }
The bit length of input data is l = 8 . The bit length of the number of the input data points is θ = 3 . - Input: { E ( d i ¯ ) B } i [ 5 ] = { E ( 106 ¯ ) B , E ( 85 ¯ ) B , E ( 50 ¯ ) B , E ( 38 ¯ ) B , E ( 35 ¯ ) B } . - Output: { E ( C i ) } i [ 5 ] = { E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 1 ) } (indicating that the minimum value is 35).
Table 9. Example of an efficient version of the ipMAX Protocol (Algorithm 3).
Table 9. Example of an efficient version of the ipMAX Protocol (Algorithm 3).
j { E ( d i ) B } i [ 5 ] s ρ { E ( γ k ) } k 2 E ( q ) { E ( C i ) } i [ 5 ] ϵ
······ { E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 1 ) } ·
7 { E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) } 03 { E ( 0 ) , E ( 1 ) , E ( 1 ) } E ( 1 ) { E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 1 ) } r
6 { E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 0 ) , E ( 0 ) } 37 { E ( 0 ) , E ( 1 ) , E ( 0 ) } E ( 0 ) { E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 0 ) , E ( 0 ) } r
5 { E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 1 ) , E ( 1 ) } 04 { E ( 1 ) , E ( 0 ) , E ( 0 ) } E ( 1 ) { E ( 1 ) , E ( 1 ) , E ( 1 ) , E ( 0 ) , E ( 0 ) } r
4 { E ( 1 ) , E ( 1 ) , E ( 0 ) , E ( 1 ) , E ( 0 ) } 26 { E ( 0 ) , E ( 0 ) , E ( 0 ) } E ( 0 ) { E ( 1 ) , E ( 1 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) } r
3 { E ( 0 ) , E ( 0 ) , E ( 1 ) , E ( 0 ) , E ( 1 ) } 01 { E ( 0 ) , E ( 0 ) , E ( 1 ) } E ( 1 ) { E ( 1 ) , E ( 1 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) } r
2 { E ( 1 ) , E ( 0 ) , E ( 1 ) , E ( 1 ) , E ( 0 ) } 15 { E ( 1 ) , E ( 1 ) , E ( 0 ) } E ( 0 ) { E ( 1 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) } 0
The bit length of input data is l = 8 . The bit length of the number of the input data points is θ = 3 . r is a random value. For the given input data, the efficient version terminates at the second bit; therefore, it reveals the information that the maximum value is determined at the second bit. On the other hand, the secure version terminates at the 0th bit for the same input data in Table 4. The secure version always runs all for-loops, regardless of the input data and terminates at the 0th bit. Thus, it reveals no information about the input data. - Input: { E ( d i ) B } i [ 5 ] = { E ( 85 ) B , E ( 82 ) B , E ( 79 ) B , E ( 54 ) B , E ( 41 ) B } . V = { l 1 , , 2 , 1 , 0 } (indicating that the DH and CSP perform the checking step in all iterations). - Output: { E ( C i ) } i [ 5 ] = { E ( 1 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) } (indicating that the maximum value is 85).
Table 10. The communication and computation costs for a single iteration (i.e., one bit) of the efficient version. These operations are repeated for at most l iterations, depending on an input dataset.
Table 10. The communication and computation costs for a single iteration (i.e., one bit) of the efficient version. These operations are repeated for at most l iterations, depending on an input dataset.
(a) Communication costs
StepRoundsVolume
DHCSP
Steps 1–33 ( 2 n + θ + 2 ) · C ( n + θ + 1 ) · C
Checking step1 1 · C 1
Total4 ( 2 n + θ + 3 ) · C ( n + θ + 1 ) · C + 1
(b) Computation costs
CaseStepDHCSP
Enc/DecExpEnc/DecExp
Normal
case
Steps 1–3 4 n + 2 4 n + 2 θ + 1 3 n + 2 θ + 3 0
Checking step0110
Total 4 n + 2 4 n + 2 θ + 2 3 n + 2 θ + 4 0
Precom-
putation
Steps 1–30 4 n + 2 θ + 1 3 n + 2 θ + 3 0
Checking step0110
Total0 4 n + 2 θ + 2 3 n + 2 θ + 4 0
Parallel
execution
Steps 1–32560
Checking step0110
Total2670
P a r a l l e l e x e c u t i o n means the number of sequential batches, with sufficiently many parallel threads assumed.
Table 11. Comparison of communication volume and execution time with the prior protocol.
Table 11. Comparison of communication volume and execution time with the prior protocol.
ProtocolCommunication VolumeExecution Time
DH (MB) CSP (MB) Ratio (%) Time (s) Ratio (%)
Prior protocol [20]110.2155.11100%212100%
Secure version55.227.5950%16678%
Table 12. Summary of existing ppMAX/ppMIN protocols.
Table 12. Summary of existing ppMAX/ppMIN protocols.
Ref.PrivacyAccuracy of ResultPractical FeasibilityKey TechniquesLimitations
[8]LowApproximateReal ExperimentHash, ProbabilisticLeaks Max value to aggregator, Potential false positives
[9]LowApproximateReal ExperimentHash, ProbabilisticLeaks Max value to aggregator, Potential false positives
[34]Low-ModerateApproximateSimulationProbabilistic, OptimizationProbabilistic errors, Lacks real-world evaluation
[18]ModerateApproximateSimulationHomomorphic encryptionNumerical errors, High computation cost for comparisons
[21]ModerateApproximateSimulationHomomorphic encryptionAccuracy depends on parameter k, High computation cost for comparisons
[37]HighExactSimulationQuantumRequires unavailable quantum infrastructure
[38]HighExactSimulationQuantum, Bloom filterDependent on specialized quantum hardware
[19]HighExactReal ExperimentMultiparty computationHigh communication overhead
[Ours]HighExactReal ExperimentHomomorphic encryptionThe cost of the efficient version can converge to that of the secure version
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Park, J. Optimized and Privacy-Preserving MAX/MIN Protocols for Large-Scale Data. Appl. Sci. 2026, 16, 2580. https://doi.org/10.3390/app16052580

AMA Style

Park J. Optimized and Privacy-Preserving MAX/MIN Protocols for Large-Scale Data. Applied Sciences. 2026; 16(5):2580. https://doi.org/10.3390/app16052580

Chicago/Turabian Style

Park, Jeongsu. 2026. "Optimized and Privacy-Preserving MAX/MIN Protocols for Large-Scale Data" Applied Sciences 16, no. 5: 2580. https://doi.org/10.3390/app16052580

APA Style

Park, J. (2026). Optimized and Privacy-Preserving MAX/MIN Protocols for Large-Scale Data. Applied Sciences, 16(5), 2580. https://doi.org/10.3390/app16052580

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop