Building Equi-Width Histograms on Homomorphically Encrypted Data

Lazea, Dragoș; Hangan, Anca; Cioara, Tudor

doi:10.3390/fi17060256

Open AccessArticle

Building Equi-Width Histograms on Homomorphically Encrypted Data

by

Dragoș Lazea

,

Anca Hangan

^*

and

Tudor Cioara

Computer Science Department, Technical University of Cluj-Napoca, Memorandumului 28, 400114 Cluj-Napoca, Romania

^*

Author to whom correspondence should be addressed.

Future Internet 2025, 17(6), 256; https://doi.org/10.3390/fi17060256

Submission received: 6 May 2025 / Revised: 5 June 2025 / Accepted: 6 June 2025 / Published: 10 June 2025

(This article belongs to the Special Issue IoT Security: Threat Detection, Analysis and Defense)

Download

Browse Figures

Versions Notes

Abstract

Histograms are widely used for summarizing data distributions, detecting anomalies, and improving machine learning models’ accuracy. However, traditional histogram-based methods require access to raw data, raising privacy concerns, particularly in sensitive IoT applications. Encryption-based techniques offer potential solutions; however, they secure the data in transit or storage, requiring decryption during analysis, which exposes raw data to potential privacy risks. In this paper, we propose a method for constructing privacy-preserving histograms directly on homomorphically encrypted IoT data, leveraging the Fast Fully Homomorphic Encryption over the Torus (TFHE) scheme implemented in the Concrete framework. To overcome the challenges posed by homomorphic encryption, we redesign the traditional histogram construction algorithm, optimizing it for secure computation by addressing constraints related to nested loops and conditional statements. As an evaluation use case, we have considered an outlier detection mechanism based on histogram frequency counts, ensuring that all data and computations remain encrypted throughout the process. Our method achieves results consistent with plaintext-based outlier detection while maintaining reasonable computational overhead compared to those reported in the existing literature.

Keywords:

histogram computation; histogram-based outlier detection; homomorphic encryption; internet of things; privacy

1. Introduction

In today’s interconnected world, IoT devices have become integral to daily life, enhancing convenience, efficiency, and automation. From smart controllers and meters to wearables and connected homes, they transform industries and drive smart city development. IoT platforms aggregate sensor data, analyze patterns, and support decision-making with AI advancements further evolving these devices into intelligent systems [1,2]. As IoT adoption grows, so does its impact on urban planning, healthcare, and energy management [3,4]. As IoT devices generate vast amounts of data at high frequencies to support control processes, data analysis becomes essential for proactive interventions and enhanced reliability in various applications across different domains [2,5,6].

In this context, understanding data distributions is crucial for detecting anomalies, optimizing data compression, and improving machine learning accuracy. Histograms are valuable tools in this regard, as they provide summaries of large datasets as well as approximations of data distributions, making it easier and faster to analyze the underlying patterns and compute aggregate statistics [7,8,9,10]. At the same time, the large-scale integration of IoT data from heterogenous devices may increase the vulnerability to cyberattacks and device malfunctions. Integrating heterogeneous IoT data increases cybersecurity risks, making privacy protection essential to prevent infrastructure breaches and data leaks [11]. However, traditional histogram-based methods often rely on raw data analysis, posing privacy concerns, especially for sensitive information [8,9].

Differential privacy (DP) is one of the leading methods for privately sharing histograms with untrusted parties in the state of the art. It is used on a wide variety of IoT data, ranging from smart homes to industrial sensors, particularly in tasks related to anomaly detection [12,13]. DP histograms can be constructed by adding noise to histograms created using various methods [14,15]. To deal with trade-offs between accuracy and privacy, some approaches focus on building DP sketches to generate approximate private histograms [16,17]. The authors build new differentially private constructs, such as Misra–Gries sketches or the Approximate-Estimate sketches in a streaming setting, introducing less noise than similar solutions. However, despite its advantages, differential privacy-based techniques add noise to histogram bins, which can reduce accuracy, and defining the right privacy budget is a non-trivial task requiring extensive fine-tuning. Multi-party computation (MPC) can be used to compute histograms collaboratively across multiple parties without revealing individual data and adding noise to the data or the result. In such approaches, differential privacy is applied to the final output for additional protection; however, the tradeoff lies in the communication overhead, as parties must exchange many messages during computation [18,19,20]. The construction of histograms using lightweight masked encryption is explored in [21]. The authors introduce a scheme for secure frequency histogram estimation, allowing a central server to utilize histograms without identifying which values originated from specific users. While such solutions can be computationally efficient, they require at least partial decryption or assume plaintext access before analysis.

In this context, homomorphic encryption (HE) schemes are promising solutions for building privacy-preserving histograms. They enable the processing and execution of operations directly on encrypted IoT data ensuring that the IoT data remains encrypted throughout the entire process [22]. However, homomorphic encryption schemes have limitations which make them difficult to use in the context of histogram building processes. The supported operations are primarily restricted to addition and multiplication and pose constraints on the range and type of operands, which are typically limited to integer values [23,24,25,26,27]. Additional challenges related to secure computations such as comparisons, divisions or exponentiation over the encrypted domain still need to be addressed [28,29]. Overall, encryption/decryption mechanisms and HE data processing generate very large computational overheads that may negatively influence the system’s responsiveness, making it difficult to address the trade-off between privacy and system operation [30]. Finally, noise induced by the encryption scheme can affect the data analysis accuracy, especially when dealing with small deviations from the IoT device’s normal raw data values [31,32].

Therefore, in this paper we propose a solution for building histograms on homomorphic encrypted data generated by IoT devices without relying on partial or total decryption. To leverage the flexibility of the Fast Fully Homomorphic Encryption over the Torus (TFHE) scheme implemented in Concrete [33,34], we redesign the traditional histogram construction algorithm and address its limitations in handling nested loops and conditional statements. To assess the effectiveness of the private histogram, we implement an outlier detection use case, where outliers are identified using a thresholding rule applied to histogram counts, with all computations performed on encrypted data. The proposed privacy-preserving outlier detection mechanism has consistent results with the corresponding mechanism that operates on plain data, without needing to decrypt the results for decision-making. Moreover, the computational overheads are kept within reasonable limits.

The key contributions of this paper are as follows:

A privacy-preserving method for constructing histograms on homomorphically encrypted data, enabling outlier detection without requiring decryption.
Adaptation of traditional histogram construction and histogram-based outlier detection techniques for the Fast Fully Homomorphic Encryption over the Torus (TFHE) scheme in Concrete. We address challenges related to input size constraints and computational depth by vectorizing key operations, including addition, value placement in buckets, labeling abnormal buckets based on threshold frequency, and identifying outliers by their range and assigned bucket labels.
Performance evaluation and analysis of computational overhead trade-offs across different configuration setups, assessing their impact on the effectiveness of privacy-preserving histograms in IoT environments.

2. Equi-Width Histograms on Homomorphically Encrypted Data

Different types of histograms, such as equi-depth, equi-width, and dynamic-width, offer varying levels of effectiveness depending on the characteristics of the underlying data distribution [35]. Equi-depth histograms are more effective for skewed data, whereas equi-width histograms perform better on uniformly distributed data. However, the adaptability of equi-depth histograms comes with increased complexity in both construction and maintenance. In this work, we chose to implement equi-width histograms because of their lower computational cost.

Equi-width histograms capture the overall data distribution by grouping data points into fixed-width buckets. In such histograms, buckets have the same size (range), but the number of values that fit in each of them is variable. Each bucket can be represented as a tuple:

b = (l o w, h i g h, c o u n t), l o w, h i g h, c o u n t \in N,

(1)

where

l o w

and

h i g h

give the range of values which fit in the bucket and count is the number of data points within the

[l o w, h i g h)

range.

Next, we present the implementation of equi-width histogram computation on encrypted data and illustrate its effectiveness in detecting outliers in IoT data. The overview of our approach is depicted in Figure 1. The proposed framework simulates a real-world setting where outlier detection is performed on encrypted IoT data using prior measurement distribution. To this end, we first preprocess the raw IoT data by scaling and converting it to integer values within the range

[0, 100]

, as many homomorphic encryption frameworks do not support floating point operations. Next, we homomorphically encrypt the data and construct an encrypted equi-width histogram, which is then used for outlier detection across three different configuration scenarios described in the evaluation section.

For the implementation, we chose the TFHE scheme in Concrete [33,34] due to its superior flexibility in converting various operations into table lookups, outperforming other FHE schemes [36,37,38,39]. A key advantage is its support for function composition [40], enabling secure multi-step computations without decryption. Additionally, Concrete allows limited floating point operations [41], but its Compile tool only supports integer inputs and outputs, as computations rely on table lookups’ mapping integers to integer results.

In Concrete, data is encrypted using Learning With Errors (LWE), which implies a series of constraints not only on the allowed operations and on the way that functions are implemented, but also on the size of the plain values. Moreover, this learning-based approach of translating arithmetic operations into table lookup operations is a probabilistic model which is prone to errors. The main drawback of Concrete is that it may not provide the expected answer in all test cases, but its default failure probability is very low (1 in 100,000).

As a fully homomorphic encryption scheme, TFHE implemented in Concrete overcomes the drawback of the traditional encryption schemes in which the data is temporarily encrypted during transmission between the client and the server. The server decrypts it upon receipt, executes the necessary operations on the plaintext, re-encrypts the result, and returns it to the client. In Concrete, the only private key holder is the client, and the server-side operations are entirely performed on encrypted data, using the public evaluation key, which is mathematically related to the client’s private key.

2.1. Private Histogram Construction

First, we define a function to determine whether a given value falls within a bucket defined by a specific range. While this may seem like a straightforward operation, implementing it directly on homomorphically encrypted data presents significant challenges. Homomorphic encryption imposes strict limits on the number of operations that can be performed on the same ciphertext, and conventional loops using a counter variable are not feasible, especially when the number of iterations is large. This restriction exists because each iteration increases the computational depth on the ciphertext, potentially exceeding the allowed limit. As only foreach loops and vectorial operations are allowed without any restrictions, alternative procedures need to be implemented to ensure compliance with the constraints imposed by the encryption scheme. Therefore, we implement the range-checking procedure in a vectorized manner, allowing us to efficiently determine whether a given data point x falls within a set of buckets, represented as lists of low values,

l o w []

, and of high values,

h i g h []

.

Given an input value x and two lists containing, on the same position, the minimum (

l o w

) and, respectively, the maximum (

h i g h

) value of each bucket, applying the previously defined procedure in a vectorial approach involves iteratively checking whether x falls within the range defined by the corresponding low and high values. The result of each comparison is appended to a new list

r []

, maintaining the same index as the original range lists. Therefore, the vectorized implementation returns a list of the same size as the

l o w []

and

h i g h []

lists containing a single value of 1 at the position corresponding to the bucket where x fits, while all the other elements of the result list are equal to 0.

This vectorial approach to determine the bucket in which each value of the input fits can be formally described as a map operation, which takes a data sample and the lists containing the ranges of buckets as inputs and returns another vector as output, where each element of the output vector is the result of applying the

i s I n B u c k e t

function to the corresponding elements of the input vectors, as follows:

i s I n B u c k e t : N^{3} \to {0, 1}, i s I n B u c k e t (x, l o w, h i g h) = \{\begin{matrix} 1, & if x \in [l o w, h i g h) \\ 0, & otherwise \end{matrix}

(2)

\begin{matrix} r = v e c t o r i a l I s I n B u c k e t (x, l o w, h i g h) = m a p (i s I n B u c k e t (x, l o w, h i g h)) \\ = [i s I n B u c k e t (x, l o w_{0}, h i g h_{0}), \dots, i s I n B u c k e t (x, l o w_{n - 1}, h i g h_{n - 1})] \end{matrix}

(3)

where x is a scalar and

l o w

and

h i g h

are lists. Algorithm 1 shows the vectorial implementation of the procedure.

Algorithm 1 Vectorial is in bucket.

Require: x ▹ New sampled data from IoT devices
Require: low[], high[] ▹ Lists of bucket bounds in histogram
Ensure: r[] ▹ List with 1 at the position of the matching bucket

1:: $r \leftarrow []$
2:: for $i = 0$ to $l o w . l e n g t h - 1$ do
3:: $r . append (isInBucket (x, l o w [i], h i g h [i]))$
4:: end for
5:: return r

Using the vectorial approach to determine the bucket in which the given value fits, we can compute the histogram of an input dataset by adding the result vectors for each value in the input set, as shown in Algorithm 2. To achieve this, we use the vectorial addition operation which is defined as the element-wise addition of their corresponding components.

Algorithm 2 Building equi-width histogram on homomorphically encrypted data.

Require: x[] ▹ New sampled data stream from IoT devices
Require: low[], high[] ▹ Lists of bucket bounds in histogram
Ensure: h[] ▹Equi-width histogram

1:: $h \leftarrow [0, 0, \dots, 0]$
2:: for all $x_{i} \in x$ do
3:: $h \leftarrow h + vectorialIsInBucket (x_{i}, l o w, h i g h)$
4:: end for
5:: return h

For two given vectors

v

and

u

of size n, the result of the vectorial addition is another vector,

w

, of the same size, defined as follows:

w = u + v = [u_{0}, u_{1}, \dots, u_{n - 1}] + [v_{0}, v_{1}, \dots, v_{n - 1}] = [u_{0} + v_{0}, u_{1} + v_{1}, \dots, u_{n - 1} + v_{n - 1}]

(4)

Generalizing the above relation for the case of m vectors,

u^{(j)}

,

j = \bar{1, m}

, each having n components, the result of the vectorial addition is:

\begin{matrix} w = u^{(1)} + u^{(2)} + \dots + u^{(m)} = [u_{0}^{(1)}, u_{1}^{(1)}, \dots, u_{n - 1}^{(1)}] + [u_{0}^{(2)}, u_{1}^{(2)}, \dots, u_{n - 1}^{(2)}] + \dots + [u_{0}^{(m)}, u_{1}^{(m)}, \dots, u_{n - 1}^{(m)}] \\ = [u_{0}^{(1)} + u_{0}^{(2)} + \dots + u_{0}^{(m)}, u_{1}^{(1)} + u_{1}^{(2)} + \dots + u_{1}^{(m)}, \dots, u_{n - 1}^{(1)} + u_{n - 1}^{(2)} + \dots + u_{n - 1}^{(m)}] \end{matrix}

(5)

The vectorial addition can be performed only on vectors having the same number of components, since the components of the sum are the element-wise sums of the components of all input vectors. Therefore, for a given input list of elements, if we perform an element-wise addition of the results given by the vectorial range check function for each element, we determine the equi-width histogram of the input set, as depicted in Figure 2.

2.2. Detecting Outliers in Encrypted Data Using Private Histograms

As homomorphic encryption schemes impose significant limitations in terms of available operations, performing more complex data analysis on homomorphically encrypted data is a challenging task. In this section, we show how our private equi-width histogram can be utilized for detecting outliers in homomorphically encrypted IoT sensor data streams by analyzing the distribution of data across buckets. Since outliers typically appear less frequently than normal values, data points can be labeled based on the number of elements within the same bucket.

Therefore, to bypass the limitations of existing homomorphic encryption schemes, a mainly additive and comparison-based simple thresholding approach can be applied, where data falling into buckets with low counts is classified as an outlier. Values that belong to low-frequency buckets are identified as outliers, whereas values in high-frequency buckets are considered normal.

As in the case of constructing a private histogram, we adopt a vectorized approach for outlier detection on FHE-encrypted data, leveraging frequency-based labeling. Specifically, we define a procedure that, given an input value and a bucket, determines whether the value falls within the bucket and whether the bucket’s element count is below a predefined threshold (as shown in Algorithm 3). By vectorizing this procedure, we enable repeated application across an entire IoT sensor dataset, while adhering to the constraints of fully homomorphic encryption.

To accomplish this, we define a map operation that processes an IoT data sample alongside lists representing bucket ranges, element counts per bucket, and a predefined threshold. This operation outputs a list in which each element corresponds to the result of applying the

i s I n A b n o r m a l B u c k e t

function (defined below) to the respective elements of the input lists. The function determines whether a given data point falls within a bucket and if the bucket’s element count is below the specified threshold, as follows:

\begin{matrix} i s I n A b n o r m a l B u c k e t : N^{5} \to {0, 1}, i s I n A b n o r m a l B u c k e t (x, l o w, h i g h, c o u n t, t h) = \\ = \{\begin{matrix} 1, & if x \in [l o w, h i g h) and c o u n t < t h \\ 0, & otherwise \end{matrix} \end{matrix}

(6)

\begin{matrix} r = v e c t o r i a l I s I n A b n o r m a l B u c k e t (x, l o w, h i g h, c o u n t, t h) = \\ = m a p (i s I n A b n o r m a l B u c k e t (x, l o w, h i g h, c o u n t, t h)) \\ = [i s I n A b n o r m a l B u c k e t (x, l o w_{0}, h i g h_{0}, c o u n t_{0}, t h), \dots, i s I n A b n o r m a l B u c k e t (x, l o w_{n - 1}, h i g h_{n - 1}, c o u n t_{n - 1}, t h)] \end{matrix}

(7)

where x and

t h

are scalars and

l o w

,

h i g h

, and

c o u n t

are lists defining the buckets in the histogram. This approach is formally defined in Algorithm 3.

Algorithm 3 Vectorial is in abnormal bucket.

Require: x ▹New data sample
Require: low[], high[] ▹ Bucket bounds in histogram
Require: count[] ▹ Element counts per bucket
Require: th ▹ Threshold for outlier detection
Ensure: r[] ▹ Vector with 1 at the index of the abnormal bucket if any

1:: $r \leftarrow []$
2:: for $i = 0$ to $l o w . l e n g t h - 1$ do
3:: $r . append (isInAbnormalBucket (x, l o w [i], h i g h [i], c o u n t [i], t h))$
4:: end for
5:: return r

To compute the binary label of a specific value, we perform a reduce operation on the result provided by the

v e c t o r i a l I s I n A b n o r m a l B u c k e t

procedure by summing all the elements in the list. Reducing a list by summing is a recursive vectorial operation which is defined as follows:

r e d u c e (L, \oplus) = \{\begin{matrix} 0, & if L = [] \\ x + r e d u c e (L^{'}, \oplus), & if L = [x] + L^{'} \end{matrix}

(8)

where L is the original list, x is the first element of the list, and

L^{'}

is the rest of the list. This operation generates a single value that is the sum of all the elements of the list, by adding the first element to the sum of the rest of the list, until the list is empty (see Algorithm 4). For an empty list, the reduce-by-summing operation returns 0.

Algorithm 4 Equi-width histogram-based outlier detection on homomorphically encrypted data.

Require: x[] ▹New sampled data stream from IoT devices
Require: low[], high[] ▹ Bucket bounds in histogram
Require: count[] ▹ Element counts per bucket
Require: th ▹ Frequency threshold for outlier detection
Ensure: a[] ▹ Vector of outliers detected per sample

1:: $a \leftarrow []$
2:: for all $x_{i} \in x$ do
3:: $r \leftarrow vectorialIsInAbnormalBucket (x_{i}, l o w, h i g h, c o u n t, t h)$
4:: $a . append (\sum r)$
5:: end for
6:: return a

The vectorial approach of labeling an input value involves iteratively checking if the value is in each of the buckets defined by three lists: one for minimum values, one for maximum values, and one for the current element count in each bucket. When the bucket in which the value must be placed is determined, the number of values that it contains is compared to a given threshold.

The result of the vectorial procedure is a list of the same size as the lists containing the range limits for the buckets. It contains at most one element set to 1, positioned at the index of the bucket where the value fits, but only if the bucket’s element count is below the threshold. If no such bucket meets the condition, all elements in the output list are 0. This mechanism is depicted in Figure 3.

2.3. Threat Model Definition and Assumptions

Our privacy-preserving histogram construction technique targets edge–cloud deployments in IoT environments, where sensitive data collected from distributed sensors is encrypted using homomorphic encryption and aggregated before being transmitted to the cloud. The data sources (i.e., sensors) are considered to be trusted entities. We assume that sensor data is encrypted at the source immediately upon collection using the private encryption keys managed by a trusted HE controller. Data is then transmitted to more capable but untrusted edge/cloud processors, where the histogram building and outlier detection modules are deployed, as shown in Figure 4. These edge/cloud processors are not granted access to private encyption/decryption keys, but only to the evaluation public key to perform the histogram construction and anomaly detection. The HE controller is the only trusted entity responsible for distributing keys, validating computation circuits, and enforcing policies. Only a single trusted controller governs each deployment.

In such a deployment, we consider several potential risks, including the execution of tampered circuits by malicious edge nodes, replay attacks aiming to skew the histogram, and statistical inference from aggregate outputs. To mitigate these threats, all data is encrypted at the source using homomorphic encryption, and we assuume that only validated, signed computation circuits approved by the trusted controller are allowed to run. Additionally, we assume the existence of authenticated channels over which the encrypted data is transmitted, and that policy enforcement is centralized under a single trusted HE controller that governs key management and computation integrity. However, this setup introduces a single point of failure, as the HE controller is solely responsible for key management, policy enforcement, and circuit validation.

3. Evaluation Results

We implemented the proposed solution in Python 3.10.6, using the Concrete Homomorphic Encryption library [40]. The code is available on GitHub (version 2.38.0) [42]. All experiments were conducted in a laboratory environment and were not deployed in a real-world setting, such as the one described in Section 2.3. The evaluation was performed on a computer equipped with an Intel(R) Core(TM) i7-10750H CPU @ 2.60 GHz (6 physical cores, 12 logical processors) and 16 GB of RAM. While this setup provides a reasonable baseline for assessing the computational performance of our methods, it does not reflect the constraints typically found on edge devices. Therefore, further optimization and testing would be required for deployment in resource-constrained environments. Nonetheless, the trained circuits generated with Concrete could be deployed on more powerful edge or fog computing nodes.

To evaluate our prototypes, we use the data provided by IoT devices in a city district, specifically an electricity production dataset [43] containing measurements recorded by smart meters on an hourly basis. Since the values measured by the IoT meters are floating point values between 0 and 9.91 kW and our models can operate only on integer data which has a limited binary representation, we preprocess the electric power data by scaling it to integer values between 0 and 100 to meet the constraints of our prototypes. To achieve this, we normalize the data to ensure it fits into a limited range, multiply the normalized values by 100, and round the result to the nearest integer, as shown in Figure 1. By applying this normalization-based data preprocessing, we aim at converting the data to small integers while not disturbing the general distribution of the values. As the device measures the produced electric power, the values are higher at midday and lower during the morning and the evening (see reference values in Figure 5). The distribution of the data is preserved by the data preprocessing flow that we use. This preprocessing strategy also supports generalizability to other types of IoT data, such as temperature, motion, or air quality, since these values can similarly be normalized and scaled to the

[0, 100]

range before encryption, allowing our modules to operate on a broad range of sensor inputs.

To simulate a real-world scenario in which the data received from IoT devices is already encrypted, we encrypted benchmark data using homomorphic encryption and then fed it to the outlier detection method in three use cases. However, in practice, the models would be pre-trained on data within the expected range of the IoT devices, without requiring this additional encryption step.

We ran multiple experiments on encrypted data to assess the performance of our outlier detection method on encrypted data under various challenges, including sensor noise, adversarial inputs, communication issues, and device malfunctions—such as sensors persistently reporting constant value regardless of environmental conditions. To this aim, we artificially introduced anomalies in the preprocessed data by adding noise values to simulate various kinds of abnormal situations which can be encountered in an IoT sensor-based system. To ensure that the data used at building the equi-width histogram reflects the expected behavior of the monitored phenomenon, we consider the average of the data recorded by the electric smart meter during five consecutive days as the reference data.

This histogram-based approach of detecting outliers based on a frequency threshold performs best on catching anomalies in data containing significant variations and samples which exceed the range of the reference data. However, the major assumption this technique makes is that outliers have a considerably lower frequency compared to normal data; this method does not catch constant anomalies when the value repeatedly reported by the sensor is present within the reference data used at building the histogram.

Determining the frequency threshold used at deciding if observations are outliers or not is crucial in ensuring a high detection precision. Overly high threshold values result in a larger number of bins being considered to contain outliers, increasing the number of false positives and thus lowering precision, while a low value of the threshold leads to a large number of false negative samples. Furthermore, the threshold value should also be selected according to the number of buckets in the histogram, as a large number of buckets implies lower frequencies inside buckets. Therefore, to determine an appropriate frequency threshold for detecting outliers, we performed a sensitivity analysis by varying the number of histogram buckets and frequency thresholds to evaluate how these parameters influence detection precision. Specifically, we computed precision scores for each combination and tried to identify the optimal match depending on the dataset, as shown in Figure 6 for data depicted in Figure 5b. Our analysis revealed that the precision generally decreases when both the number of buckets in the histogram and the threshold increase.

Figure 5 depicts the results of our method on various types of variations, including real measurements from the IoT smart meter as well as artificially modified data representing several abnormal behaviors. As shown in Table 1, both the cryptographic and non-cryptographic methods detect the same number of outliers, indicating that our privacy-preserving histogram-based outlier detector achieves the same precision as the traditional method operating on raw data. The corresponding performance metrics for both the plaintext and cryptographic techniques are summarized in Table 2. However, it is important to note that the accuracy of the private histogram outlier detection technique heavily depends on the training data. When presented with data in ranges that have not been seen during training, the private histogram may produce errors or fail to detect certain outliers as expected.

In the following, we discuss and compare the complexity and execution time of our outlier detection solution for encrypted data considering three different configuration use cases depicted in Figure 7. The complexity of our configurations is determined in terms of basic operations, such as clear addition, encrypted addition, encrypted negation, and programmable bootstrap, which is used to reduce the amount of noise within the encrypted representation to allow performing multiple operations on the same encrypted operands. The relative complexity of the circuits gives a hint about which circuit is faster (lower complexity will determine a lower execution time), as presented in [44]. To ensure the relevance and reliability of the reported results, we conducted approximately five runs for each of the experiments described below. The execution times and operation counts reported for our histogram construction and outlier detection circuits represent the average values computed across these runs. However, the results across all runs showed minimal variance, with the computed values remaining consistently close.

Table 1. Comparison of outlier detection between non-cryptographic and cryptographic methods performance on various types of outliers.

Outlier Type (in Figure 5)	# Outliers in Dataset	# Normal Samples in Dataset	# Outliers Detected by Non-Cryptographic Method	# Outliers Detected by Cryptographic Method
Figure 5a	3	21	4	4
Figure 5b	12	12	9	9
Figure 5c	24	0	24	24
Figure 5d	22	2	14	14
Figure 5e	13	11	12	12

Table 2. Performance metrics of our outlier detection method.

Data (in Figure 5)	Precision $\frac{TP}{TP + FP}$	Recall $\frac{TP}{TP + FN}$	F1 Score $\frac{2 \times Precition \times Recall}{Presicion + Recall}$
Figure 5a	0.75	1.0	0.86
Figure 5b	1.0	0.75	0.86
Figure 5c	1.0	1.0	1.0
Figure 5d	1.0	0.64	0.78
Figure 5e	1.0	0.92	0.96

3.1. First Configuration Use Case (UC-1)

UC-1 assumes the availability of a previously computed histogram of IoT sensor data for the normal behavior of the device. This histogram, along with new IoT sensor data, serves as input for the outlier detection circuit. In this setup, the expected data variance over the selected time interval is a key factor. To account for this, we construct a reference histogram using hourly mean energy measurements recorded over several past days. UC-1 is designed to evaluate the outlier detection circuit independently, focusing on identifying outliers in encrypted data without addressing the construction of the data distribution.

In our evaluation, we aim to determine how the size of the plain input data samples affects the size of the encrypted inputs, results, and secret keys. As all the values used in computations are preprocessed to fit in

[0, 100]

, they can all be represented using 8 bits. Therefore, when computing the size of the plain inputs, we consider that the size of each value is 1 Byte. Thus, since the measurements are recorded on an hourly basis and each bucket is represented by a tuple containing three values, namely (low, high, count), the size of the clear inputs can be expressed, in bytes, as follows:

s i z e o f (D a t a) + s i z e o f (H i s t o g r a m) + s i z e o f (t h r e s h o l d) = 24 \times No . of days + 3 \times No . of buckets + 1

(9)

We measured the variation of the size of encrypted inputs, outputs, and secret keys with the size of the plain inputs, expressed as pairs of number of days considered, each containing 24 measurements, and the number of buckets in the reference histogram used to detect outliers. The actual size of the inputs, expressed in bytes, can be determined using relation 9. The size of the encrypted inputs is correlated to the size of the raw inputs, but the size of the encrypted data is considerably higher. The size of the encrypted output also increases with the size of plain inputs and depends only on the number of measurements to be labeled, as the module outputs an encrypted label associated with each input measurement. However, the size of the secret keys is not correlated with the size of input data, and it has the same order of magnitude, regardless of the size of the inputs. As depicted in Figure 8a, even if the size of the encrypted inputs increases almost linearly with the size of the clear inputs, the overhead introduced by the encryption is significant (

\times 10^{6}

).

The complexity of the model can be expressed in terms of operations performed during the outlier detection process. This complexity for UC-1 is highly dependent on the size of the input samples. As previously described, in our case, the length of the input is determined by the number of buckets in the histogram and the number of IoT data samples we want to process. We measured the variation of different types of operations with the size of the inputs. As expected, the number of operations performed increases when the size of the data input increases. However, since we propose an additive algorithm operating on encrypted data, encrypted addition is the most frequently performed operation. The number of programmable bootstrap operations, used to reduce the noise and to allow repeated computations on the same ciphertext, also significantly increases with the size of the inputs as the number of values that must be repeatedly operated on grows.

A synthetic view of the operation-based complexity variation is depicted in Figure 8b. We consider the total number of operations as a measure of the complexity of the model. In this global view, the complexity of our prototype in UC-1 doubles when either the number of buckets or the number of measurements doubles, resulting in an almost linear variation.

We evaluate the performance of our prototype by measuring the compile time, which is the time needed to train the model using a set of plain input samples, the time taken by the model to generate the keys, and the time taken to generate the encrypted labels associated with the input encrypted measurements. Figure 8c depicts the variation of all time components considered for UC-1 configuration, with the size of inputs. The time required for key generation is significantly lower than both the compilation and detection (execution) times. Meanwhile, compilation and detection times follow a similar pattern, increasing exponentially as the input size grows.

3.2. Second Configuration Use Case (UC-2)

UC-2 focuses on evaluating the histogram construction and outlier detection phases of our privacy-preserving solution. This approach ensures a more accurate representation of the data distribution, which may shift over time due to seasonality and other external factors. As a result, the reference histogram needs to be periodically updated. This histogram, along with the data to be labeled, is then fed into the outlier detection circuit. This use case allows for independent analysis of the two key phases in the proposed outlier detection process. Transferring the reference histogram between two circuits requires either decrypting and re-encrypting it with the second circuit’s key or securely sharing encryption keys via a trusted third party. While the Concrete library supports storing keys on disk and loading them for use in another circuit, this method introduces potential security risks by exposing sensitive data, making it more vulnerable to exploitation.

We measured the complexity of both the histogram construction and outlier detection modules concerning the size of encrypted inputs, outputs, and secret keys. The encrypted input size is similar for both modules. However, the encrypted output size is significantly larger for the outlier detection circuit. This increase is due to its strong dependence on input size, as the output consists of encrypted binary labels corresponding to each input value. Furthermore, the encrypted input and output sizes of the outlier detection circuit are comparable to the input size in UC-1, as both use the same circuit. Given that 24 measurements are sampled per day and each bucket range is represented as a

(l o w, h i g h)

value pair, the size of the clear inputs for the histogram construction phase can be expressed as follows:

s i z e o f (D a t a R e f) + s i z e o f (B u c k e t s R a n g e) = 24 \times No . of days + 2 \times No . of buckets

(10)

The size of the plain inputs of the outlier detection phase is equal to the size of the inputs in UC-1 (see (9)). The overhead introduced by encrypting the inputs is around

\times 10^{6}

in the case of both circuits in UC-2, as Figure 9 shows. The relationship between encrypted and plain input sizes is linear for the histogram construction prototype and nearly linear for the outlier detection prototype.

We evaluated the operation-based complexity of both models, displaying the number of operations of each type performed by each circuit. Similar to UC-1, encrypted addition is the most frequent operation, as both circuits rely primarily on vectorial addition. While the total number of operations is higher during the outlier detection phase leading to increased circuit complexity, it remains comparable to the complexity of the model in UC-1. As shown in Figure 10, the operational complexity of both circuits, as well as the total number of operations performed in our prototype circuits for UC-2, approximately doubles when either the size of the data to be labeled or the number of buckets doubles. However, running the models in UC-2 with a 50-bucket histogram and two full days of data samples proved infeasible in our experimental setup, as it required excessive computation time and exceeded the available processing power.

The performance of the circuits in UC-2 in terms of compile (training), generating keys, and building histogram or outlier detection time is synthetically presented Figure 10. The compile time is considerably lower for the histogram building prototype, while the time taken to generate the keys and the execution (building histogram or outlier detection) time are comparable between the two circuits.

The execution time generally increases with the size of inputs and, implicitly, with the number of operations performed on input values. However, for both histogram building and outlier detection circuits, the execution time and key generating time shows a local minimum when the size of inputs is

(# d a y s, # b u c k e t s) = (2, 20)

, as depicted in Figure 11, while, in the case of training (compile) time, the local minimum is encountered at an input of size

(# d a y s, # b u c k e t s) = (2, 10)

.

3.3. Third Use Case (UC-3)

UC-3 aims to increase the security of the overall process by computing the histogram and detecting outliers using the same circuit. However, this results in a higher complexity of the circuit and slower execution. In this scenario, the global complexity of the technique is evaluated.

As in UC-1 and UC-2, the complexity of the model regarding the size of encrypted inputs increases with the size of data, while the size of the encrypted outputs is influenced only by the number of buckets in the equi-width histogram. However, as more operations are performed in the model that we define in UC-3, we notice a slight increase in the size of secret keys compared to the size of secret keys generated in UC-1 and UC-2. The total size of the inputs is expressed for UC-3 in a totally similar way to the size of inputs corresponding to UC-1 and UC-2 (see relations (9) and (10)). As in UC-2, performing experiments on a 50-bucket histogram using 2-full-days data samples is not feasible in UC-3.

Since in UC-3, we compute the reference histogram and detect outliers using a single model, the size of inputs is considerably higher than in the case of circuits defined in UC-1 and UC-2. Therefore, the overhead introduced by encrypting the inputs is even higher than in UC-1 and UC-2, reaching an increase of almost

\times 2 \cdot 10^{6}

in the size of inputs, as shown in Figure 12a.

Integrating both the histogram building and outlier detection phases into the same model, as defined in UC-3, increases the computational complexity of the circuit. This results in higher overall complexity due to the greater number of operations performed. The number of operations of each type performed by our prototype in UC-3 is roughly the sum of the operations from the circuits that separately handle histogram construction and outlier detection in UC-2. Consequently, the complexity of the circuit used in UC-3 provides a comprehensive view of the overall complexity of the proposed privacy-preserving outlier detection technique.

Similar to UC-1 and UC-2, the effect of the number of samples to be labeled and the number of buckets in the reference histogram on the model’s operation count is consistent. As shown in Figure 12b, doubling both the number of data samples and the number of buckets results in models with comparable operational complexity.

We measured the time variation of the model in UC-3 relative to input size. Although the circuit in UC-3 is a combination of the two circuits from UC-2, the compile time, key generation time, and execution time of the UC-3 circuit do not simply add up to the sum of the corresponding times for the two individual circuits in UC-2. The execution and compile time for UC-3 follow the same pattern as the corresponding time components in UC-2. A local minimum can be identified for an input size of

(# d a y s, # o f b u c k e t s) = (2, 10)

for this prototype, as depicted in Figure 12c. However, taking into consideration the high complexity of the model in terms of both input size and number of operations, the training and execution (i.e., histogram construction and outlier detection) are slower than in UC-1 and UC-2.

4. Conclusions

In this paper, we introduce a solution for building equi-width histograms and detecting outliers on homomorphically encrypted data generated by IoT devices. To address constraints such as the limited set of operations that can be performed on encrypted data, we leverage the TFHE scheme implemented in Concrete, a fully homomorphic encryption variant optimized for practical applications. Our approach incorporates a redesigned histogram-based technique tailored for encrypted IoT data streams, enabling the detection of sensor readings that statistically deviate from previous measurements and are therefore considered anomalous. We have implemented the main operations of the equi-width histograms’ outlier detection algorithm adapted for the TFHE homomorphic encryption scheme. The customization involves vectorizing additive operations, value placement in buckets, and labeling abnormal buckets based on a threshold frequency.

The proposed solution effectively constructs histograms and detects outliers on fully homomorphic encrypted data streams without decryption. While operations on encrypted numbers introduce significant execution overhead, they remain practical for offline processing. Labelling 24 to 48 sensor measurements can take minutes to hours, depending on input size and histogram complexity. Communication between the client and outlier detector is minimal, and deploying the detector closer to the edge reduces cloud traffic. The experiments were conducted in a laboratory setting using standard hardware. The proposed solution’s performance in an actual deployment still needs to be investigated, as in real-world conditions, it may be affected by more limited memory capacity, lower processing power, and stricter energy constraints typical of edge computing nodes.

Using the TFHE scheme and Concrete library offers advantages such as encrypted data operations, ready-to-deploy circuits, and efficient key management. However, limitations persist, including restrictions on computational depth, arithmetic precision, and function complexity. Errors may arise due to reliance on precomputed lookup tables, but these are generally less impactful than potential IoT sensor noise.

Processing encrypted IoT data is constrained by computational overhead, requiring careful data scaling to fit within limited bit representations. Vectorial operations offer a more flexible alternative to iterative methods. Our experiments used small integer values (8-bit) to ensure feasible outlier detection configurations.

Lastly, designing computations on encrypted data is rigid. Changes in input size or operations necessitate circuit recompilation and model retraining, which can take several hours. More complex algorithms result in larger, costlier circuits, increasing execution time.

While encrypted data processing introduces additional execution time and homomorphic encryption imposes constraints on operation size and precision, these trade-offs are inherent to secure computation. Additionally, adapting to changes in input size or operations requires recompilation and retraining, which may limit flexibility. However, our solution remains highly effective for constructing cryptographic equi-width histograms on homomorphically encrypted data and can be successfully applied to IoT data outlier detection, ensuring compliance with data protection regulations. The promising results highlight its potential for advancing privacy-preserving analytics.

Author Contributions

Conceptualization, D.L., A.H. and T.C.; methodology, A.H. and T.C.; software, D.L.; validation, D.L. and A.H.; data curation, D.L.; writing—original draft preparation, D.L., A.H. and T.C.; writing—review and editing, D.L., A.H. and T.C.; visualization, D.L., A.H. and T.C.; supervision, A.H. and T.C.; project administration, T.C.; funding acquisition, T.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received funding from the European Union’s Horizon Europe research and innovation program under the Grant Agreement number 101136216. Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union or the European Climate, Infrastructure, and Environment Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

TFHE	Fully Homomorphic Encryption over the Torus
IoT	Internet of Things
LWE	Learning With Errors

References

Soori, M.; Arezoo, B.; Dastres, R. Internet of things for smart factories in industry 4.0, a review. Internet Things Cyber-Phys. Syst. 2023, 3, 192–204. [Google Scholar] [CrossRef]
Villegas-Ch, W.; Govea, J.; Jaramillo-Alcazar, A. Iot anomaly detection to strengthen cybersecurity in the critical infrastructure of smart cities. Appl. Sci. 2023, 13, 10977. [Google Scholar] [CrossRef]
Diraco, G.; Rescio, G.; Caroppo, A.; Manni, A.; Leone, A. Human action recognition in smart living services and applications: Context awareness, data availability, personalization, and privacy. Sensors 2023, 23, 6040. [Google Scholar] [CrossRef]
Farrokhi, A.; Farahbakhsh, R.; Rezazadeh, J.; Minerva, R. Application of Internet of Things and artificial intelligence for smart fitness: A survey. Comput. Netw. 2021, 189, 107859. [Google Scholar] [CrossRef]
Diro, A.; Kaisar, S.; Vasilakos, A.V.; Anwar, A.; Nasirian, A.; Olani, G. Anomaly detection for space information networks: A survey of challenges, techniques, and future directions. Comput. Secur. 2024, 139, 103705. [Google Scholar] [CrossRef]
Li, C.; Wang, J.; Wang, S.; Zhang, Y. A review of IoT applications in healthcare. Neurocomputing 2024, 565, 127017. [Google Scholar] [CrossRef]
Mousavi, H.; Zaniolo, C. Fast and accurate computation of equi-depth histograms over data streams. In Proceedings of the 14th International Conference on Extending Database Technology, Uppsala, Sweden, 21–24 March 2011; pp. 69–80. [Google Scholar]
Goldstein, M.; Dengel, A. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm. KI-2012 Poster Demo Track 2012, 1, 59–63. [Google Scholar]
Ge, L.; Parhi, K.K. Robust clustering using hyperdimensional computing. IEEE Open J. Circuits Syst. 2024, 5, 102–116. [Google Scholar] [CrossRef]
Yuan, H.; Cong, G.; Li, G. Nuhuo: An effective estimation model for traffic speed histogram imputation on a road network. Proc. VLDB Endow. 2024, 17, 1605–1617. [Google Scholar] [CrossRef]
Sasi, T.; Lashkari, A.H.; Lu, R.; Xiong, P.; Iqbal, S. A comprehensive survey on IoT attacks: Taxonomy, detection mechanisms and challenges. J. Inf. Intell. 2024, 2, 455–513. [Google Scholar] [CrossRef]
Cunningham, T.; Cormode, G.; Ferhatosmanoglu, H. Privacy-preserving synthetic location data in the real world. In Proceedings of the 17th International Symposium on Spatial and Temporal Databases, Online, 23–25 August 2021; pp. 23–33. [Google Scholar]
Waites, C.; Cummings, R. Differentially private normalizing flows for privacy-preserving density estimation. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, Virtual, 19–21 May 2021; pp. 1000–1009. [Google Scholar]
Cai, Z.; Zheng, X.; Wang, J.; He, Z. Private data trading towards range counting queries in internet of things. IEEE Trans. Mob. Comput. 2022, 22, 4881–4897. [Google Scholar] [CrossRef]
Tosa, A.; Hangan, A.; Sebestyen, G.; István, Z. In-Storage Computation of Histograms with differential privacy. In Proceedings of the 2021 International Conference on Field-Programmable Technology (ICFPT), Auckland, New Zealand, 6–10 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–4. [Google Scholar]
Lebeda, C.J.; Tetek, J. Better differentially private approximate histograms and heavy hitters using the Misra-Gries sketch. In Proceedings of the 42nd ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Seattle, WA, USA, 18–23 June 2023; pp. 79–88. [Google Scholar]
Wang, X.; Mo, L.; Zheng, X.; Dang, Z. Streaming Histogram Publication Over Weighted Sliding Windows Under Differential Privacy. Tsinghua Sci. Technol. 2024, 29, 1674–1693. [Google Scholar] [CrossRef]
Bell, J.; Gascon, A.; Ghazi, B.; Kumar, R.; Manurangsi, P.; Raykova, M.; Schoppmann, P. Distributed, private, sparse histograms in the two-server model. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, Los Angeles, CA, USA, 7–11 November 2022; pp. 307–321. [Google Scholar]
Braun, L.; Gascón, A.; Raykova, M.; Schoppmann, P.; Seth, K. Malicious security for sparse private histograms. Cryptology ePrint Archive 2024. [Google Scholar]
Liebenow, J.; Imort, T.; Fuchs, Y.; Heisel, M.; Käding, N.; Rupp, J.; Mohammadi, E. PrivAgE: A Toolchain for Privacy-Preserving Distributed Aggregation on Edge-Devices. KI-Künstl. Intell. 2024, 38, 183–188. [Google Scholar] [CrossRef]
Su, Y.; Li, J.; Li, J.; Su, Z.; Meng, W.; Yin, H.; Lu, R. Robust and lightweight data aggregation with histogram estimation in edge-cloud systems. IEEE Trans. Netw. Sci. Eng. 2024, 11, 2864–2875. [Google Scholar] [CrossRef]
Shrestha, R.; Kim, S. Integration of IoT with blockchain and homomorphic encryption: Challenging issues and opportunities. In Advances in Computers; Elsevier: Amsterdam, The Netherlands, 2019; Volume 115, pp. 293–331. [Google Scholar]
Zhang, C.; Yin, A.; Zuo, W.; Chen, Y. Privacy preserving anomaly detection based on local density estimation. Math. Biosci. Eng. 2020, 17, 3478–3497. [Google Scholar] [CrossRef]
Alabdulatif, A.; Kumarage, H.; Khalil, I.; Yi, X. Privacy-preserving anomaly detection in cloud with lightweight homomorphic encryption. J. Comput. Syst. Sci. 2017, 90, 28–45. [Google Scholar] [CrossRef]
Chen, S.; Susilo, W.; Zhang, Y.; Yang, B.; Zhang, M. Privacy-preserving anomaly counting for time-series data in edge-assisted crowdsensing. Comput. Stand. Interfaces 2023, 85, 103707. [Google Scholar] [CrossRef]
Li, R.; Ishimaki, Y.; Yamana, H. Fully homomorphic encryption with table lookup for privacy-preserving smart grid. In Proceedings of the 2019 IEEE International Conference on Smart Computing (SMARTCOMP), Washington, DC, USA, 12–15 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 19–24. [Google Scholar]
Itokazu, K.; Wang, L.; Ozawa, S. Outlier Detection by Privacy-Preserving Ensemble Decision Tree U sing Homomorphic Encryption. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Virtual, 18–22 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–7. [Google Scholar]
Bourse, F.; Sanders, O.; Traoré, J. Improved secure integer comparison via homomorphic encryption. In Proceedings of the Cryptographers’ Track at the RSA Conference, San Francisco, CA, USA, 24–28 February 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 391–416. [Google Scholar]
Babenko, M.; Golimblevskaia, E. Euclidean division method for the homomorphic scheme ckks. In Proceedings of the 2021 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (ElConRus), St. Petersburg, Moscow, Russia, 26–29 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 217–220. [Google Scholar]
Zhang, L.; Li, C.; Hu, Q.; Lang, J.; Huang, S.; Hu, L.; Leng, J.; Chen, Q.; Lv, C. Enhancing privacy in large language model with homomorphic encryption and sparse attention. Appl. Sci. 2023, 13, 13146. [Google Scholar] [CrossRef]
Li, R.; Bhattacharjee, S.; Das, S.K.; Yamana, H. Look-up table based FHE system for privacy preserving anomaly detection in smart grids. In Proceedings of the 2022 IEEE International Conference on Smart Computing (SMARTCOMP), Helsinki, Finland, 20–24 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 108–115. [Google Scholar]
Shen, M.; Tang, X.; Zhu, L.; Du, X.; Guizani, M. Privacy-preserving support vector machine training over blockchain-based encrypted IoT data in smart cities. IEEE Internet Things J. 2019, 6, 7702–7712. [Google Scholar] [CrossRef]
Zama. What Is Concrete? 2024. Available online: https://docs.zama.ai/concrete/ (accessed on 19 February 2024).
Chillotti, I.; Gama, N.; Georgieva, M.; Izabachène, M. TFHE: Fast fully homomorphic encryption over the torus. J. Cryptol. 2020, 33, 34–91. [Google Scholar] [CrossRef]
Ioannidis, Y. The history of histograms (abridged). In Proceedings of the 2003 VLDB Conference, Berlin, Germany, 9-12 September 2003; Elsevier: Amsterdam, The Netherlands, 2003; pp. 19–30. [Google Scholar]
Microsoft. Microsoft SEAL. 2024. Available online: https://github.com/Microsoft/SEAL?tab=readme-ov-file (accessed on 19 February 2024).
Brakerski, Z.; Gentry, C.; Vaikuntanathan, V. (Leveled) fully homomorphic encryption without bootstrapping. ACM Trans. Comput. Theory (TOCT) 2014, 6, 1–36. [Google Scholar] [CrossRef]
Cheon, J.H.; Kim, A.; Kim, M.; Song, Y. Homomorphic encryption for arithmetic of approximate numbers. In Proceedings of the Advances in Cryptology—ASIACRYPT 2017: 23rd International Conference on the Theory and Applications of Cryptology and Information Security, Hong Kong, China, 3–7 December 2017; Proceedings, Part i 23. Springer: Berlin/Heidelberg, Germany, 2017; pp. 409–437. [Google Scholar]
Fan, J.; Vercauteren, F. Somewhat practical fully homomorphic encryption. Cryptology ePrint Archive 2012. [Google Scholar]
Concrete. Composition. 2024. Available online: https://docs.zama.ai/concrete/2.5/tutorials/composition (accessed on 19 February 2024).
Concrete. Floating Points. 2024. Available online: https://docs.zama.ai/concrete/explanations/advanced-features/floating_points (accessed on 19 February 2024).
HEADpy. HEADpy. 2024. Available online: https://github.com/dragoslazea/HEADpy (accessed on 22 March 2024).
De Mulder, C. 2021 Electricity and Heat Demand Data for a City District (Belgium). Zenodo, Published 20 January 2022. Available online: https://zenodo.org/records/5155659 (accessed on 19 February 2024).
Concrete. Depth and Complexity of Circuit. 2024. Available online: https://community.zama.ai/t/depth-and-complexity-of-circuit/485 (accessed on 19 February 2024).

Figure 1. Pipeline of the proposed framework for privacy-preserving outlier detection in IoT data using private equi-width histograms.

Figure 2. Vectorial approach of constructing equi-width histograms.

Figure 3. Vectorial approach of detecting outliers using equi-width histograms.

Figure 4. Deployment sketch and threat model.

Figure 5. Results provided by our privacy-preserving outlier detection method on different kinds of variations which simulate various abnormal situations encountered in an IoT system: (a) measurements not matching with the reference histogram, (b) unusual data pattern with significant variations(spikes), (c) constant values reported by a faulty IoT smart meter, (d) data produced by a noisy IoT sensor, (e) noisy data exceeding the maximum value we have considered.

Figure 6. Sensitivity analysis of number of buckets and frequency threshold for data in Figure 5b.

Figure 7. Configuration use cases for our outlier detection solution.

Figure 8. UC-1 configuration analysis: (a) correlation between the size of encrypted inputs and the size of clear inputs; (b) variation of complexity for different number of buckets and size of input data; (c) compile, execution, and key generation times variation with the size of the plain input data.

Figure 9. Correlation of encrypted input size with the size of clear inputs for both histogram building and outlier detection phases in UC-2: (a) histogram building phase encrypted input size, (b) outlier detection phase encrypted input size.

Figure 10. Variation of execution time, compile time, and key generation time for both histogram building and outlier detection phases of UC-2 with the size of the plain input data: (a) histogram building phase time variation, (b) outlier detection phase time variation.

Figure 11. Variation of complexity in configuration UC-2 for different number of buckets and size of input data: (a) histogram building phase operation complexity, (b) outlier detection phase operation complexity.

Figure 12. UC-3 configuration analysis: (a) correlation between the size of encrypted inputs and the size of clear inputs; (b) variation of complexity for different number of buckets and size of input data; (c) compile, execution, and key generation times variation with the size of the plain input data.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lazea, D.; Hangan, A.; Cioara, T. Building Equi-Width Histograms on Homomorphically Encrypted Data. Future Internet 2025, 17, 256. https://doi.org/10.3390/fi17060256

AMA Style

Lazea D, Hangan A, Cioara T. Building Equi-Width Histograms on Homomorphically Encrypted Data. Future Internet. 2025; 17(6):256. https://doi.org/10.3390/fi17060256

Chicago/Turabian Style

Lazea, Dragoș, Anca Hangan, and Tudor Cioara. 2025. "Building Equi-Width Histograms on Homomorphically Encrypted Data" Future Internet 17, no. 6: 256. https://doi.org/10.3390/fi17060256

APA Style

Lazea, D., Hangan, A., & Cioara, T. (2025). Building Equi-Width Histograms on Homomorphically Encrypted Data. Future Internet, 17(6), 256. https://doi.org/10.3390/fi17060256

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Building Equi-Width Histograms on Homomorphically Encrypted Data

Abstract

1. Introduction

2. Equi-Width Histograms on Homomorphically Encrypted Data

2.1. Private Histogram Construction

2.2. Detecting Outliers in Encrypted Data Using Private Histograms

2.3. Threat Model Definition and Assumptions

3. Evaluation Results

3.1. First Configuration Use Case (UC-1)

3.2. Second Configuration Use Case (UC-2)

3.3. Third Use Case (UC-3)

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI