# Distributed Bayesian Inference for Large-Scale IoT Systems

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Background and Related Work

#### 2.1. Bayesian Inference in Wireless Sensor Networks

#### 2.2. Apache Hadoop

#### 2.3. Spark

## 3. Methodology

#### 3.1. Objectives and Contributions

#### 3.2. Tools and Technologies Used

**Pyspark:**- Spark’s Python API that simplifies distributed data analysis and Machine Learning tasks.
- Spark SQL: A programming interface for structured and semi-structured data management with SQL-like syntax. In our research, it has been used for data preprocessing and manipulation.
- Spark MLlib: Library for development of scalable machine learning models. Here we have utilized its built-in Logistic Regression algorithms, along with its predefined classification metrics.

Here we set the master URL to run the Spark application in local mode, utilizing all available cores on the machine for parallel processing. The memory allocation for the driver program and executors are both set to 5 gigabytes. Lastly, we determine the number of cores assigned to each executor, allocating 6 cores for our application. These configuration settings ensure efficient execution and utilization of computational resources. **nupmy:**- Provides the necessary tools and functionalities to efficiently implement the numerical computations involved in Bayesian Logistic Regression, such as array operations.
**pymc3:**- High-level intuitive interface for probabilistic programming in Python [22].
**pandas:**- High-performance data manipulation and analysis tool for structured data. Here it is used to handle the data and allow for use with other libraries for scientific computing and machine learning.
**matplotlib:**- Used to visualise data.
**scikit-learn:**- Used to obtain classification metrics when working with pandas data frame and predictions with numpy.
**psutil:**- Used to retrieve system information, such as CPU and Memory Usage.
**time:**- Used to compute Inference time for each test set and Classification method.

#### 3.3. Study Limitations

#### 3.4. Data Collection and Cleaning

_{2}), and PM2.5. Through the utilization of this categorization framework, individuals are empowered to augment their understanding of air quality and render informed judgments that protect the welfare of the general public. One can obtain a precise depiction of the health hazards linked to the air quality in a specific area by ascertaining the air quality classification through the utilization of the highest recorded value of the pollutants in question. This information must be utilized to inform efforts to mitigate the negative health effects associated with air pollution; thus, strict adherence to European standards for the index cannot be overstated.

`PySpark.sql`and

`PySpark.mllib`modules of the PySpark framework to facilitate the data compilation process. Absent values were managed with extreme care to detail during the preparatory phase by employing time-based interpolation as a technique. Despite the atypical sampling intervals employed, this approach was indispensable in maintaining the dataset’s integrity and accuracy. To achieve a comprehensive understanding and analysis of the evolution of air quality trends during a specified time period, it was essential that the data’s inherent temporal patterns be preserved.

`AQI_Index`, was affixed to each document to classify them in accordance with the Air Quality Index (AQI) associated with the most severe level of five different pollutants. An additional column denoted as

`AQI_GenPop_Index`, was incorporated to symbolize binary AQI values that indicate the suitability of the air quality for the general population. We addressed outliers within pollutant columns using the interquartile range (IQR) technique, considering the data’s skewed nature. Conclusively, we employed z-score normalization on the pollutant columns to standardize their measurements, ensuring a mean of 0 and a standard deviation of 1. Subsequently, we performed sampling to create datasets equivalent to 1 year, 3 years, 6 years, 9 years, 12 years, and 15 years from the preprocessed data. The one-year sample was designated as the training set, while the remaining six subsets were employed for testing (as shown in Table 2, Table 3 and Table 4).

#### 3.5. Experiment

- a
- The Bayesian classifier’s performance is assessed in two different environments: one using Pyspark and another using the Numpy and Pandas libraries, likely to compare the performance of different implementations.
- b
- Bayesian vs. Frequentist Logistic Regression Classifier using Pyspark: comparison of the performance of the Bayesian logistic regression classifier with the frequentist logistic regression classifier, both implemented using Pyspark.

## 4. Results and Analysis

#### 4.1. Data Scalability

**Inference Time:**- The evaluation of the scalability of our classifier commences with a comprehensive analysis of the Inference Time across a multitude of test sets. It is noteworthy to mention that the prediction times exhibit consistency, notwithstanding fluctuations in the test sets’ dimensions or composition. The aforementioned consistency underscores the effectiveness of the classifier in handling a wide range of data scenarios. The duration values for the 12-year and 9-year test sets are 60.44 ms and 81.02 ms, respectively. The consistent and efficient performance of the classifier is underscored by the negligible fluctuation in execution time, which is independent of the test set’s magnitude. When assessing scalability, the classifier’s capacity to sustain comparatively consistent prediction durations is an essential criterion. This finding suggests that the classifier exhibits satisfactory performance when evaluated on more extensive datasets, without experiencing a significant duration increase in execution.
**Memory Usage:**- As shown by the data in Table 2, memory consumption rises in direct proportion to the size of the test set. This finding is in line with anticipated outcomes, considering that more memory is needed to store and process larger datasets. However, it is crucial to acknowledge that the growth in memory usage does not exhibit a direct proportionality to the test set’s size. To provide an example, the increase in memory usage from 645 MB to 860.2 MB is significantly less noticeable than the corresponding increase of 430.0 MB in the size of the test set. The data quantity exhibits a nonlinear arithmetic relationship with the observed trends in memory utilization. This indicates that the correlation is being affected by factors other than the amount of data. It may be inferred from this that factors like model complexity and execution details could impact memory needs.When working with large amounts of data, using PySpark for machine learning and classification might lead to memory usage issues. Spark is able to do this by dividing the data into smaller, more manageable pieces, which allows for distributed processing over a vast network of computers. The demands on processing and memory storage grow in proportion to the size of the dataset since more segments are formed as the dataset grows in size. Additionally, Spark’s memory usage is affected by the many data operations and transformations it does. It must be recognized that the amount of data written does not always correlate directly with the rise in memory use. This variety sometimes results from the convergence of several factors, including the complexity of the model used and the specific nuances of its implementation.
**CPU Usage:**- Examining the CPU usage across test sets of different sizes is crucial for assessing our classifier’s performance. Our primary finding implies that CPU utilization remains relatively constant across different sizes of test sets. Computing power consumption is reliably efficient, as shown by the CPU utilization percentages ranging from 0.29 to 5.8% in the 15-year and 9-year test sets. No matter how much the test data becomes, our classifier’s CPU use stays the same. This indicates that it efficiently employs the available processing power without experiencing a substantial escalation in resource requirements. Thus, it is well-suited for scalability in computational systems where efficient resource utilization is extremely important.
**Classification Metrics:**- The classification metrics reveal consistent performance across different test set sizes. The accuracy ranges from 0.878833 to 0.8791466, indicating a robust and stable classifier. Also, the specificity across the different test sets remains constant at a nearly perfect value of 0.995. These metrics (accuracy, precision, etc.) demonstrate EVCA’s ability to handle varying data volumes while maintaining reliable and accurate predictions. This highlights the scalability of the classifier, accommodating diverse data sizes while accounting for inherent uncertainties.

#### Comparison to Classic Logistic Regression in Pyspark

**Inference Time:**- The Bayesian classifier has inference times ranging from 60.44 ms to 81.02 ms across different test set sizes, demonstrating faster prediction speeds compared to the frequentist classifier, which ranges from 64.86 ms to 166.62 ms. This highlights the computational efficiency of the Bayesian classifier. We also note that the minor differences in inference times further emphasize the Bayesian Model’s consistent and efficient performance.
**Memory Usage:**- When examining memory usage, both the Bayesian and frequentist classifiers exhibit similar patterns. The memory usage of the Bayesian classifier increases progressively, although not in a linear manner, as the size of the test sets increases. Likewise, the frequentist classifier shows a similar pattern when it comes to memory use; as the test sets get bigger, there is a departure from a straight-line trend in the growth. These results show that factors besides the amount of test data affect how much memory both algorithms need. Other things that affect memory use are things that you cannot change, like how complicated the model is and how it was implemented.
**CPU Usage:**- Both Bayesian and frequentist algorithms make good use of computer resources. It is consistent that the Bayesian predictor uses very little CPU, ranging from 0.29% to 5.88% for different set sizes. In the same way, the frequentist classification always uses between 6.1 and 1 percent of its central processing unit (CPU). Based on the performance that has been seen, it looks like both algorithms make good use of the processing power that is available, keeping the system’s resources from being overloaded. It is important to note that the Bayesian algorithm has a small tendency to use less CPU, which could be good for computer efficiency, especially when working with bigger datasets. The fact that the algorithms use the central processing unit (CPU) smoothly and efficiently suggests that they can work with computer systems that need to be scalable and use resources efficiently.
**Classification Metrics:**- Both the Bayesian and frequentist models are still very good at making predictions. Multiple test sets of various sizes show that the Bayesian classification is accurate within the range of 0.87882 to 0.8791466. However, the frequentist estimator is more accurate, with a range of accuracy between 0.892238 and 0.892396. A high sensitivity value of 0.995 is always obtained by the Bayesian classifier, no matter how big the test set is. This means the classifier does not make a lot of mistakes and can guess true positives properly. One thing that you should keep in mind is that the frequentist classifier, which looks at all test set sizes, has a pretty high rate of false positives (0.868), as shown by its sensitivity value. In this case, the Bayesian classifier is more specific than the frequentist classifier, which means it can correctly identify true positives. When it is very important to obtain the negative class correctly, the Bayesian classifier works optimally because the method is more specific.

#### 4.2. Computational Scalability

**Inference Time:**- The NumPy and Pandas classifier shows slightly longer inference times compared to the Pyspark classifier for the smaller datasets, while it significantly increases as the data increases. On the other hand, the pyspark classifier maintains an almost stable inference time as the data increases. The inference time for the numpy classifier ranges from 55.694 ms to 268.03 ms, while the Pyspark classifier ranges from 60.44 ms to 81.02 ms, as seen in Table 4. This difference emphasizes the scalability of the Pyspark classifier in terms of prediction time, which is visualized in Figure 12.
**Memory Usage:**- The comparison between the NumPy and Pandas classifier and the PySpark classifier reveals significant differences in memory usage and scalability. The memory consumption of the NumPy and Pandas classifiers varies between 15.3 MB and 89.6 MB, whereas the PySpark classifier consumes between 124 KB and 992 KB, as indicated in Figure 9. The significant disparity in memory consumption underscores the PySpark classifier’s exceptional scalability, given its capacity to process larger test set sizes with minimal memory demands. The system’s memory consumption remains exceptionally modest, irrespective of the escalating dimensions of the test sets. This demonstrates the efficiency with which the PySpark framework manages memory resources and executes distributed processing.
**CPU Usage:**- The CPU utilization of the NumPy-based classifier exhibits negligible fluctuations, ranging only 0.6% to 8.7% across test sets of varying sizes. In the scenario, it has been observed that the Pyspark-based classifier consumes a range of 0.29% to 5.8% of the CPU. Both classifiers exhibit effective utilization of CPU resources; however, it is observed that the NumPy classifier marginally consumes more CPU power as the data size expands. Clearly illustrated in the figure, this once more demonstrates the scalability of the Pyspark classifier by demonstrating its efficacy in utilizing computational resources despite the growth of data as seen in Figure 8.
**Classification Metrics:**- As per the accuracy, The NumPy and Pandas classifiers exhibit accuracy values spanning from 0.87095 to 0.87127, which are analogous to the accuracy values of 0.87882 to 0.8791466 for the Pyspark classifier. Without regard to the size of the test set, both classifiers exhibit consistent and dependable performance. Specificity: Both classifiers consistently demonstrate a specificity value of 0.945, reflecting their precision in accurately identifying true negative outcomes. This consistent performance across datasets of various sizes emphasizes their effectiveness in correctly detecting negative instances, a critical aspect of our study. These performance metrics are illustrated in Figure 10, Figure 11, Figure 12, Figure 13 and Figure 14.

## 5. Discussion

## 6. Conclusions and Future Work

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Guhaniyogi, R.; Li, C.; Savitsky, T.; Srivastava, S. Distributed Bayesian inference in massive spatial data. Stat. Sci.
**2023**, 38, 262–284. [Google Scholar] [CrossRef] - Srivastava, S.; Xu, Y. Distributed Bayesian inference in linear mixed-effects models. J. Comput. Graph. Stat.
**2021**, 30, 594–611. [Google Scholar] [CrossRef] - Ye, B.; Qin, J.; Fu, W.; Zhu, Y.; Wang, Y.; Kang, Y. Distributed Bayesian inference over sensor networks. IEEE Trans. Cybern.
**2021**, 53, 1587–1597. [Google Scholar] [CrossRef] [PubMed] - Yu, Z.; Chen, F.; Liu, J.K. Sampling-Tree Model: Efficient Implementation of Distributed Bayesian Inference in Neural Networks. IEEE Trans. Cogn. Dev. Syst.
**2020**, 12, 497–510. [Google Scholar] [CrossRef] - Zhou, C.; Li, Q.; Tham, C.K. Information-Driven Distributed Sensing for Efficient Bayesian Inference in Internet of Things Systems. In Proceedings of the 2018 15th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), Hong Kong, China, 11–13 June 2018; pp. 1–9. [Google Scholar] [CrossRef]
- Vadera, M.P.; Marlin, B.M. Challenges and Opportunities in Approximate Bayesian Deep Learning for Intelligent IoT Systems. In Proceedings of the 2021 IEEE Third International Conference on Cognitive Machine Intelligence (CogMI), Virtual, 13–15 December 2021; pp. 252–261. [Google Scholar] [CrossRef]
- Khan, F.M.; Baccour, E.; Erbad, A.; Hamdi, M. Adaptive ResNet Architecture for Distributed Inference in Resource-Constrained IoT Systems. In Proceedings of the 2023 International Wireless Communications and Mobile Computing (IWCMC), Dubrovnik, Croatia, 30 May–3 June 2023; pp. 1543–1549. [Google Scholar] [CrossRef]
- Yao, S.; Zhao, Y.; Shao, H.; Zhang, C.; Zhang, A.; Liu, D.; Liu, S.; Su, L.; Abdelzaher, T. ApDeepSense: Deep Learning Uncertainty Estimation without the Pain for IoT Applications. In Proceedings of the 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), Vienna, Austria, 2–6 July 2018; pp. 334–343. [Google Scholar] [CrossRef]
- Baccour, E.; Erbad, A.; Mohamed, A.; Hamdi, M.; Guizani, M. RL-DistPrivacy: Privacy-Aware Distributed Deep Inference for Low Latency IoT Systems. IEEE Trans. Netw. Sci. Eng.
**2022**, 9, 2066–2083. [Google Scholar] [CrossRef] - Ullah, I.; Kim, J.B.; Han, Y.H. Compound Context-Aware Bayesian Inference Scheme for Smart IoT Environment. Sensors
**2022**, 22, 3022. [Google Scholar] [CrossRef] [PubMed] - Arellanes, D.; Lau, K.K. Decentralized data flows in algebraic service compositions for the scalability of IoT systems. In Proceedings of the 2019 IEEE 5th World Forum on Internet of Things (WF-IoT), Limerick, Ireland, 15–18 April 2019; pp. 668–673. [Google Scholar]
- Nägele, T.; Hooman, J. Scalability analysis of cloud-based distributed simulations of IoT systems using HLA. In Proceedings of the 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), Singapore, 11–13 December 2018; pp. 1075–1080. [Google Scholar]
- Gelenbe, E.; Nakıp, M.; Marek, D.; Czachorski, T. Diffusion analysis improves scalability of IoT networks to mitigate the massive access problem. In Proceedings of the 2021 29th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), Virtual, 3–5 November 2021; pp. 1–8. [Google Scholar]
- Raut, A.; Kumar, D.; Chaurasiya, V.K.; Kumar, M. Distributed Decision Fusion for Large Scale IoT- Ecosystem. In Proceedings of the 2022 IEEE 15th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip (MCSoC), Penang, Malaysia, 19–22 December 2022; pp. 112–119. [Google Scholar] [CrossRef]
- Akbar, A.; Kousiouris, G.; Pervaiz, H.; Sancho, J.; Ta-Shma, P.; Carrez, F.; Moessner, K. Real-Time Probabilistic Data Fusion for Large-Scale IoT Applications. IEEE Access
**2018**, 6, 10015–10027. [Google Scholar] [CrossRef] - Chen, Y.; Kar, S.; Moura, J.M. The Internet of Things: Secure Distributed Inference. IEEE Signal Process. Mag.
**2018**, 35, 64–75. [Google Scholar] [CrossRef] - Kurniawan, A.; Kyas, M. A trust model-based Bayesian decision theory in large scale Internet of Things. In Proceedings of the 2015 IEEE Tenth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP), Singapore, 7–9 April 2015; pp. 1–5. [Google Scholar] [CrossRef]
- Krishnamachari, B.; Iyengar, S. Distributed Bayesian algorithms for fault-tolerant event region detection in wireless sensor networks. IEEE Trans. Comput.
**2004**, 53, 241–250. [Google Scholar] [CrossRef] - Janakiram, D.; Kumar, A.; Reddy V., A.M. Outlier Detection in Wireless Sensor Networks using Bayesian Belief Networks. In Proceedings of the 2006 1st International Conference on Communication Systems Software & Middleware, New Delhi, India, 8–12 January 2006; pp. 1–6. [Google Scholar] [CrossRef]
- Momani, M.; Challa, S.; Al-Hmouz, R. Bayesian Fusion Algorithm for Inferring Trust in Wireless Sensor Networks. J. Netw.
**2010**, 5, 815–822. [Google Scholar] [CrossRef] - Vlachou, E.; Karras, C.; Karras, A.; Tsolis, D.; Sioutas, S. EVCA Classifier: A MCMC-Based Classifier for Analyzing High-Dimensional Big Data. Information
**2023**, 14, 451. [Google Scholar] [CrossRef] - Salvatier, J.; Wiecki, T.V.; Fonnesbeck, C. Probabilistic programming in Python using PyMC3. Peerj Comput. Sci.
**2016**, 2, e55. [Google Scholar] [CrossRef] - Ayuntamiento de Madrid. Calidad del Aire. Datos Horarios desde 2001; Portal de Datos Abiertos del Ayuntamiento de Madrid: Madrid, Spain, 2018.
- Soluciones, D. Air Quality in Madrid (2001–2018); Kaggle: San Francisco, CA, USA, 2018. [Google Scholar]
- Bañuelos-Gimeno, J.; Sobrino, N.; Arce-Ruiz, R.M. Effects of Mobility Restrictions on Air Pollution in the Madrid Region during the COVID-19 Pandemic and Post-Pandemic Periods. Sustainability
**2023**, 15, 12702. [Google Scholar] [CrossRef] - Ayuntamiento de Madrid. Intérprete de Ficheros de Datos Horarios—Diarios y Tiempo Real; Dirección General de Sostenibilidad y Control Ambiental Subdirección General de Sostenibilidad: Madrid, Spain, 2018.

**Figure 1.**Bayesian vs. Frequentist Logistic Regression in Pyspark: Confusion Matrices for each test set for 3, 6, and 9 years.

**Figure 2.**Bayesian vs. Frequentist Logistic Regression in Pyspark: Confusion Matrices for each test set for 12, 15, and 18 years.

**Figure 8.**Bayesian Logistic Regression using 1. Pyspark 2. numpy and pandas: CPU Usage Percentage (%).

**Figure 10.**Bayesian Logistic Regression using 1. Pyspark 2. numpy and pandas: Total Classification accuracy.

**Figure 11.**Bayesian Logistic Regression using 1. Pyspark 2. numpy and pandas: Classification Specificity.

**Figure 13.**Bayesian Logistic Regression using 1. Pyspark (column 1) 2. numpy and pandas (Column 2): Confusion Matrices for each test set for 3, 6 and 9 years.

**Figure 14.**Bayesian Logistic Regression using 1. Pyspark (column 1) 2. numpy and pandas (Column 2): Confusion Matrices for each test set for 12, 15, and 18 years.

Pollutant | Good | Fair | Moderate | Poor | Very Poor | Extremely Poor |
---|---|---|---|---|---|---|

PM${}_{2.5}$ | 0–10 | 10–20 | 20–25 | 25–50 | 50–75 | 75–800 |

PM${}_{10}$ | 0–20 | 20–40 | 40–50 | 50–100 | 100–150 | 150–1200 |

NO${}_{2}$ | 0–40 | 40–90 | 90–120 | 120–230 | 230–340 | 340–1000 |

O${}_{3}$ | 0–50 | 50–100 | 100–130 | 130–240 | 240–380 | 380–800 |

SO${}_{2}$ | 0–100 | 100–200 | 200–350 | 350–500 | 500–750 | 750–1250 |

**Table 2.**Bayesian Logistic Regression Predictions using Pyspark: Classification Metrics, Inference Time, Memory and CPU usage for Testing sets containing data over a period of 3, 6, 9, 12, 15, and 18 years, respectively.

Years | Size (MB/GB) | Time (ms) | Mem. (KB/MB) | CPU (%) | Acc. | Spec. | CM TP, FN, FP, TN |
---|---|---|---|---|---|---|---|

3 | 215.0 MB | 72.5 | 4 | 8.6 | 0.878833 | 0.995 | 214524, 75434, 1432, 342993 |

6 | 430.0 MB | 71.87 | 16 | 0.5 | 0.8791466 | 0.995 | 428407, 150330, 2955, 686663 |

9 | 645 MB | 81.02 | 124 | 5.8 | 0.87882 | 0.995 | 642918, 226464, 4468, 1031852 |

12 | 860.2 MB | 60.44 | 400 | 0.99 | 0.87904 | 0.995 | 855640, 300986, 5774, 1373713 |

15 | 1.05 GB | 72.59 | 524 | 0.29 | 0.879097 | 0.995 | 1071448, 376486, 7233, 1718625 |

18 | 1.26 GB | 80.8 | 992.0 | 0.80 | 0.8791344 | 0.995 | 1285186, 451604, 8679, 2062755 |

**Table 3.**Classic(Frequentist) Logistic Regression Predictions using PySpark: Classification Metrics, Inference Time, Memory and CPU Usage for Testing Sets.

$\begin{array}{c}\mathbf{Test}\phantom{\rule{4.pt}{0ex}}\mathbf{Set}\\ \left(\mathbf{Years}\right)\end{array}$ | $\begin{array}{c}\mathbf{Test}\phantom{\rule{4.pt}{0ex}}\mathbf{Set}\\ \mathbf{Size}\end{array}$ | $\begin{array}{c}\mathbf{Inference}\\ \mathbf{Time}\phantom{\rule{4.pt}{0ex}}\left(\mathbf{ms}\right)\end{array}$ | $\begin{array}{c}\mathbf{Memory}\\ \mathbf{Usage}\end{array}$ | CPU | Acc. | Spec. | TP | FN | FP | TN |
---|---|---|---|---|---|---|---|---|---|---|

3 | 215.0 MB | 73.11 | 28.0K | 5.7% | 0.8922 | 0.868 | 239995 | 49467 | 19025 | 327071 |

6 | 430.0 MB | 166.62 | 32.0K | 2% | 0.8924 | 0.868 | 479963 | 98741 | 37702 | 651614 |

9 | 645 MB | 80.15 | 180K | 6.1% | 0.8922 | 0.868 | 719446 | 148657 | 56547 | 978703 |

12 | 860.2 MB | 64.86 | 132.0K | 5.3% | 0.8922 | 0.868 | 959656 | 197645 | 75929 | 1305684 |

15 | 1.05 GB | 96.1 | 184.0K | 5.5% | 0.8924 | 0.868 | 1199943 | 246919 | 94539 | 1631224 |

18 | 1.26 GB | 138.62 | 664.0K | 1% | 0.8923 | 0.868 | 1440301 | 296489 | 113412 | 1958022 |

**Table 4.**Bayesian Logistic Regression Predictions using Pandas Df and numpy: Classification Metrics, Inference Time, Memory and CPU Usage for Testing sets containing data over a period of 3, 6, 9, 12, 15, and 18 years, respectively.

$\begin{array}{c}\mathbf{Test}\\ \mathbf{Set}\phantom{\rule{4.pt}{0ex}}\left(\mathbf{Yrs}\right)\end{array}$ | $\begin{array}{c}\mathbf{Size}\end{array}$ | $\begin{array}{c}\mathbf{Time}\\ \left(\mathbf{ms}\right)\end{array}$ | $\begin{array}{c}\mathbf{Memory}\\ \left(\mathbf{MB}\right)\end{array}$ | CPU (%) | Acc. | Spec. | TP | FN | FP | TN |
---|---|---|---|---|---|---|---|---|---|---|

3 | 215.0 | 55.694 | 15,300 | 0.6 | 0.8709 | 0.945 | 291511 | 53600 | 28218 | 260712 |

6 | 430.0 | 85.869 | 30,800 | 0.7 | 0.8712 | 0.945 | 582671 | 106754 | 56533 | 521564 |

9 | 645.0 | 143.675 | 43,600 | 4.1 | 0.8712 | 0.945 | 873587 | 160375 | 84546 | 782416 |

12 | 860.2 | 147.869 | 59,200 | 8.5 | 0.8713 | 0.945 | 1167395 | 214209 | 112668 | 1045064 |

15 | 1050 | 175.82 | 73,700 | 8.7 | 0.8712 | 0.945 | 1458769 | 267570 | 141097 | 1305656 |

18 | 1260 | 268.03 | 89,600 | 5.7 | 0.8712 | 0.945 | 1750169 | 321265 | 169295 | 1567495 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Vlachou, E.; Karras, A.; Karras, C.; Theodorakopoulos, L.; Halkiopoulos, C.; Sioutas, S.
Distributed Bayesian Inference for Large-Scale IoT Systems. *Big Data Cogn. Comput.* **2024**, *8*, 1.
https://doi.org/10.3390/bdcc8010001

**AMA Style**

Vlachou E, Karras A, Karras C, Theodorakopoulos L, Halkiopoulos C, Sioutas S.
Distributed Bayesian Inference for Large-Scale IoT Systems. *Big Data and Cognitive Computing*. 2024; 8(1):1.
https://doi.org/10.3390/bdcc8010001

**Chicago/Turabian Style**

Vlachou, Eleni, Aristeidis Karras, Christos Karras, Leonidas Theodorakopoulos, Constantinos Halkiopoulos, and Spyros Sioutas.
2024. "Distributed Bayesian Inference for Large-Scale IoT Systems" *Big Data and Cognitive Computing* 8, no. 1: 1.
https://doi.org/10.3390/bdcc8010001