1. Introduction
Embedded systems are fundamental to modern information technology and the Internet of Things (IoT), with applications ranging from consumer electronics to industrial control systems. Securing these systems is particularly critical in IoT environments, given their rapid expansion and the projection of approximately 30 billion connected devices by 2030 [
1]. Firmware security analysis therefore plays a central role in protecting embedded devices, as vulnerabilities in firmware can compromise device integrity, enable unauthorized access, and propagate attacks across connected networks.
EMBA is an open-source firmware security analysis tool that automates the reverse engineering and vulnerability detection of embedded systems, providing actionable insights for cybersecurity professionals. EMBA is developed by Siemens Energy cybersecurity engineers led by Michael Messner [
2]. The tool integrates firmware extraction, emulation-based dynamic analysis, and static analysis and produces structured reports summarizing identified security issues. EMBA automatically detects vulnerabilities such as insecure binaries, outdated software components, vulnerable scripts, and hardcoded credentials [
3].
One of EMBA’s notable features is the generation of a Software Bill of Materials (SBOM) directly from binary firmware, which supports supply chain security assessments by correlating identified components with known vulnerabilities from exploit databases [
4]. In addition, EMBA employs a comprehensive multi-stage analysis approach to identifying zero-day vulnerabilities by examining both compiled binaries and interpreted scripts written in languages such as PHP, Python, and Lua [
5].
The selection of a deployment model for firmware analysis tools such as EMBA represents a practical and strategic infrastructure decision for security teams. Although prior research extensively evaluates the vulnerability detection capabilities and analysis methodologies of firmware security tools, significantly less attention has been given to how the execution environment influences runtime performance, cost behavior, and analytical consistency. In practice, organizations frequently migrate security workflows to cloud platforms without quantitative evidence regarding the operational trade-offs involved.
The absence of structured empirical comparisons between standalone and cloud-based deployments creates uncertainty in infrastructure planning. Without controlled measurements, decisions regarding scalability, cost efficiency, and long-term sustainability are often based on assumptions rather than reproducible data. This gap is particularly relevant for organizations conducting recurring firmware assessments under constrained budgets and compliance requirements.
Importantly, cloud deployment remains relevant even in cases where execution time does not outperform standalone systems, as scalability, remote accessibility, and workload elasticity may justify its use under specific operational conditions.
The main contributions of this paper are summarized as follows:
A systematic experimental comparison of EMBA deployed in standalone and cloud-based environments.
A quantitative evaluation of performance differences across multiple EMBA versions under identical analysis configurations.
An assessment of execution time, analytical consistency, and operational cost to evaluate deployment environment impacts.
To address this gap, this study provides a quantitative and reproducible comparison of EMBA deployed on a standalone personal computer and a Microsoft Azure cloud environment. Firmware images of varying sizes were analyzed under identical configurations to evaluate execution performance, analytical consistency, and operational cost. The results provide measurable evidence to support infrastructure-level decision-making in firmware security analysis.
The remainder of this paper is organized as follows.
Section 2 presents background information on the EMBA architecture and analysis framework.
Section 3 reviews related work.
Section 4 describes the experimental setup and methodology.
Section 5 presents the results and quantitative analysis.
Section 6 discusses platform-specific implications and performance trade-offs.
Section 7 concludes this paper and outlines future research directions.
2. Background
EMBA follows a structured, multi-stage analysis pipeline designed to support systematic firmware security evaluation. The framework begins with automated firmware unpacking and filesystem reconstruction, enabling the identification of embedded components, executable binaries, and configuration artifacts. This preparation stage ensures that subsequent analyses operate on a normalized and analyzable firmware structure.
After extraction, EMBA performs static analysis to inspect binaries, scripts, and libraries without execution. This phase identifies structural weaknesses such as insecure configurations, outdated software components, and exposed credentials. Identified elements are correlated with publicly available vulnerability databases to support risk assessment. In addition, EMBA leverages emulation-based dynamic analysis through the Quick Emulator (QEMU) to simulate runtime behavior in a controlled environment, allowing for the detection of security issues that may not be observable through static inspection alone. The integration of static and dynamic techniques increases analytical coverage and improves detection robustness.
EMBA can be deployed on either cloud-based platforms or standalone servers, each offering distinct operational trade-offs. Cloud-based deployments provide scalability, elastic resource allocation, and remote accessibility, which are advantageous for geographically distributed teams or occasional analysis tasks. However, they introduce operational complexity, including dependency management, setup overhead, data transfer delays, and potential security considerations. In contrast, standalone servers provide a controlled and predictable environment with minimal variance, full control over hardware and storage, and reduced data transfer or exposure risks. While they lack the flexibility and elasticity of cloud platforms, standalone systems can be more cost-effective for frequent or continuous firmware analysis. Detailed quantitative comparisons are presented in
Section 6.
As of December 2024, EMBA version 1.5.0 included 69 analysis modules. These are categorized into four groups: Pre-Modules (P), Core Modules (S), Live Testing Modules (L), and Finishing Modules (F) [
6,
7] (see
Appendix A). EMBA version 1.4.1 comprises 59 distinct modules, while version 1.4.2 includes 61 modules, both evaluated using a modified default scan configuration [
7].
The experiments presented in this study were conducted using EMBA versions 1.4.1, 1.4.2, and 1.5.0, which was the latest release available as of December 2024. These experiments were conducted on different hardware setups when comparing versions to assess software evolution and on identical hardware across standalone and cloud platforms to isolate performance differences due solely to deployment environment. Under the specified experimental settings, version 1.5.0 executed 68 distinct modules, each designed to perform a specific analytical task. Updated versions of EMBA, including version 1.5.2, are publicly available through the official EMBA repository, enabling users to access ongoing improvements and additional functionality [
8]. EMBA is accessible through both a command-line interface and a graphical interface, EMBArk, which presents analysis results in summarized and module-specific reports [
9].
4. Materials and Methods
To evaluate the performance of the EMBA firmware security analysis tool across different execution environments, a structured and reproducible methodology was designed. This study compares EMBA’s behavior on locally hosted standalone servers and a Microsoft Azure virtual machine configured for comparative evaluation. The methodological design focuses on three key dimensions: firmware characteristics, platform configurations, and experimental repeatability.
The overall workflow of the EMBA analysis methodology is illustrated in
Figure 1. The experimental methodology follows a structured multi-phase process that can be conceptually grouped into four main stages: (1) firmware selection and size categorization; (2) platform-specific environment preparation and EMBA configuration; (3) controlled execution including repeated runs; and (4) log extraction, the collection of metrics, and comparative analysis.
This structured workflow ensures experimental repeatability and minimizes measurement bias across deployment platforms. By isolating configuration, execution, and log extraction phases, the methodology enables consistent performance comparison between standalone and cloud environments. The inclusion of repeated runs reduces variability caused by transient system conditions, thereby improving the statistical reliability of execution time and cost-related measurements.
4.1. Firmware Selection and Size Categories
Firmware images were categorized into three representative size groups commonly observed in real-world deployments:
Small: <10 MB.
Medium: 10–30 MB.
Large: >30 MB.
These categories align with typical IoT and Industrial Internet of Things (IioT) scenarios, where smaller images often correspond to consumer devices and larger images to more complex industrial or enterprise systems. Four firmware samples were selected, at least one from each size category, based on their ability to execute the majority of EMBA modules. This selection criterion ensures a more meaningful analysis of EMBA’s end-to-end runtime, since many firmware images execute only a subset of modules, leading to incomplete or skewed runtime comparisons.
The selected firmware samples represent typical embedded device firmware with varying functional complexity.
WR940.bin (3.87 MB, Small): Firmware for a consumer-grade Wi-Fi router, primarily handling basic networking and security functions.
T8705.bin (25.5 MB, Medium): Firmware for a mid-range network-attached storage (NAS) device, responsible for filesystem management, user authentication, and network services.
R8000.chk (30.2 MB, Large): Firmware for a high-end router used in enterprise or complex networking setups, implementing advanced routing protocols and enhanced security features. The .chk extension indicates a firmware check or backup image format used in this device family.
S3008.bin (40.8 MB, Large): Firmware for a high-end industrial IoT gateway, implementing complex protocols, multiple device interfaces, and advanced security monitoring.
These samples cover a range of device types, from consumer to industrial IoT devices, and ensure that EMBA executes most of its modules, allowing for a meaningful assessment of runtime, module execution, and vulnerability detection.
4.2. Experimental Environments
A controlled lab setting was used for the standalone server experiments. Two machines with identical hardware specifications, except for CPU core count, were deployed. The hardware and software specifications of all experimental platforms are summarized in
Table 1.
This design enables an assessment of how EMBA scales additional CPU resources while holding all other factors constant. The Azure VM was intentionally configured to match the more powerful standalone server (PC2) as closely as possible. This alignment ensures that observed performance differences originate from platform characteristics such as virtualization overhead, storage performance, or cloud scheduling behavior rather than hardware discrepancies. The use of commodity-grade hardware configurations reflects setups accessible to typical practitioners, increasing the practical relevance of the findings.
4.3. EMBA Configuration and Test Execution
This subsection describes the configuration of the EMBA tool, the execution order of analyses, and the system setup used to ensure consistent and reproducible testing across all platforms.
4.3.1. EMBA Versions and Execution Order
To evaluate the evolution of EMBA’s performance and behavior across releases, three consecutive versions of the tool were selected for analysis: EMBA v1.4.1, v1.4.2, and v1.5.0. These versions represent incremental development stages of the tool and allow for an assessment of changes in module availability, execution behavior, and runtime characteristics.
The experiments were conducted sequentially, and no two firmware analyses were executed simultaneously on the same platform or machine. Each firmware scan was completed before initiating the next test to prevent resource contention and ensure consistent measurement conditions. All scan durations were recorded using the HH:MM:SS format to maintain precision and consistency in runtime reporting.
EMBA v1.4.1 was initially evaluated on a standalone system to establish a baseline. EMBA v1.4.2 was subsequently tested on a higher-performance standalone server to assess the impact of both software updates and increased computational resources. Finally, EMBA v1.5.0, the most recent release at the time of experimentation (October 2024), was evaluated on both a standalone server and a Microsoft Azure cloud virtual machine using the same execution methodology [
8].
To ensure repeatability and reproducibility, all firmware scans were executed using identical EMBA configurations and the same set of analysis modules. For standalone servers, each test was executed three independent times. All three runs produced highly consistent and nearly identical results, indicating low variance. The results from the final run, representative of all executions, were used in the analysis.
For the Azure virtual machine, only the T8705.bin firmware was executed twice due to the significantly higher cost of cloud computing. Both runs produced nearly identical outputs in terms of findings and execution time, confirming reproducibility on the cloud platform. The results from the second run were used in this study.
4.3.2. Scan Profile Configuration
By default, EMBA disables several long-running modules including S10_binaries_basic_check, S15_radare_decompile_checks, S99_grepit, S110_yara_check, and F20_vul_aggregator [
33] to optimize scan duration. In our experiments, all these modules were explicitly enabled for a more comprehensive analysis, except for a single module that was deliberately excluded. Specifically, the Ghidra-based decompilation module (S16_ghidra_decompile_checks), which reconstructs high-level code from binaries to enable a detailed static analysis of program structures and potential vulnerabilities, was removed from the scan profile by manually editing the configuration file using the Nano text editor due to its excessively long execution time; including it would have disproportionately increased scan durations and limited the practicality of repeated testing. Excluding S16 allowed this study to focus on EMBA’s core analysis capabilities while maintaining reasonable time and resource consumption.
Across versions, the modified scan profile enabled the execution of 58 modules in EMBA v1.4.1, 60 modules in v1.4.2, and 68 modules in v1.5.0, reflecting the progressive expansion of EMBA’s analysis capabilities. All changes to the scan profile were systematically documented, including the specific configuration lines modified, to ensure experimental reproducibility.
4.3.3. System and Platform Setup
All experiments were conducted on dedicated physical machines running Ubuntu 22.04 LTS, as recommended in EMBA’s documentation, using the x86-64 architecture with sufficient CPU cores and memory for stable execution [
34]. Two standalone servers, differing only in CPU core count, enabled the assessment of scalability while controlling other hardware variables. For cloud-based testing, a Microsoft Azure virtual machine was provisioned with specifications closely matching the higher-performance standalone system, minimizing confounding factors such as virtualization overhead and storage performance. All EMBA dependencies were installed according to the official guidelines, and identical software environments and configurations were applied across both standalone and cloud platforms to ensure consistency.
4.3.4. System Limitations
Three distinct platforms were prepared for testing: two standalone servers (PC1 and PC2) and one Microsoft Azure cloud virtual machine (VM). Despite this controlled setup, several system-related limitations should be considered when interpreting the results. The hardware configurations of PC1, PC2, and the Azure VM (see
Table 1) introduce certain system-level constraints. In particular, the Azure VM operates as a shared cloud resource, and minor variability in CPU and I/O performance may occur compared to dedicated standalone servers.
System resource utilization was monitored during all experiments. Peak memory consumption across both standalone servers and the Azure VM did not exceed approximately 10 GB of RAM, indicating that under the tested conditions and selected firmware samples, EMBA’s performance was not memory-bound. Instead, execution time appeared to be primarily influenced by CPU processing capacity and disk I/O behavior. Memory usage may vary depending on firmware size, enabled analysis modules, and scan configurations; therefore, these observations are specific to this experimental setup.
Disk space also influenced system stability and performance. Although EMBA requires a minimum of 30 GB of free disk space, at least 100 GB is recommended for optimal operation. In this study, 128 GB was provisioned on all platforms to accommodate extracted firmware files, intermediate artifacts, and analysis outputs. Preliminary attempts to run EMBA with less than the minimum recommended disk space caused operational errors and degraded performance [
34].
Finally, the experimental evaluation focused on a selected subset of firmware images that successfully executed the majority of EMBA modules. While this approach improved comparability and reproducibility, it may not fully capture EMBA’s behavior on firmware that triggers fewer modules or requires alternative analysis paths.
Taken together, these limitations indicate that the reported results reflect performance characteristics within a controlled but constrained experimental environment. While our primary objective was to compare EMBA’s behavior across standalone and cloud platforms and to evaluate version-specific effects, we acknowledge that the results may differ in larger-scale or heterogeneous hardware environments. Future studies using higher-performance hardware, alternative architecture, and a broader range of firmware samples may provide additional insights into EMBA’s scalability and resource behavior.
5. Results
Across all platforms and EMBA versions, a total of 39 test executions were conducted. PC1 (EMBA v1.4.1) ran three firmware samples three times each, PC2 (EMBA v1.4.2) ran five firmware samples three times each, PC2 (EMBA v1.5.0) ran three firmware samples three times each, and the Azure VM (EMBA v1.5.0) ran three firmware samples two times each. This comprehensive multi-run design ensures the reproducibility and statistical reliability of the results.
To ensure the reliability and reproducibility of the experimental results, multiple test repetitions were conducted for the selected firmware sample. For EMBA version 1.5.0 analyzing the T8705.bin firmware, the standalone server (PC2) executed the scan three times, while the Azure virtual machine (VM) executed the same scan twice due to cost constraints.
For all repetitions, the number of detected findings remained constant at 2198, indicating the functional consistency of the analysis. Execution times exhibited minimal variation within each platform. On the standalone PC environment (PC2), the three repeated runs yielded execution times of 14:20:35, 14:21:01, and 14:20:13, corresponding to a mean execution time of 860.6 min with a standard deviation of approximately 0.40 min (~24 s). This represents a relative variation of less than 0.05%, demonstrating a high level of repeatability (
Figure 2).
Inference: The extremely low standard deviation (below 0.05%) demonstrates that EMBA v1.5.0 produces highly stable runtime behavior under identical hardware conditions. This level of repeatability confirms that single-run measurements on a dedicated system can be considered reliable for performance benchmarking. For practitioners, this means that performance comparisons across firmware samples or system configurations can be conducted with high confidence that observed differences are not due to measurement noise.
Similarly, the Azure VM runs resulted in execution times of 1 day, 1:45:17 and 1 day, 1:43:20, showing a variation of approximately ±2 min between runs. Although a third execution was not performed due to higher operational costs, the close agreement between the two runs provides preliminary evidence of reproducibility within the cloud environment.
This multi-run experimental design enables the calculation of basic statistical measures, including the mean and standard deviation, directly addressing concerns regarding statistical validation and confirming that observed performance differences are stable and consistent rather than artifacts of isolated measurements.
5.1. Comparison of EMBA Versions 1.4.1 and 1.4.2
We conducted a detailed performance assessment of EMBA versions 1.4.1 and 1.4.2, employing identical default scan profiles across PC1 and PC2. The most important objective was to assess the number of executed modules, runtime efficiency, and findings across firmware samples of varying sizes.
As shown in
Table 2, the results indicate differences between the two versions and systems. With modified settings, EMBA version 1.4.1 executed 58 modules on PC1, while version 1.4.2 ran 60 modules on PC2 under identical scan profiles. This difference shows software improvements in version 1.4.2 that enable the processing of additional modules. Observations during the tests showed that while modules in version 1.4.1 were executed sequentially, version 1.4.2 introduced the ability to execute certain modules concurrently. Running these modules simultaneously helped reduce scan times, especially with larger firmware samples, as it takes better advantage of parallel processing.
Inference: The results indicate that upgrading from EMBA v1.4.1 to v1.4.2 provides substantial runtime improvements, particularly for medium and large firmware samples. The introduction of concurrent module execution significantly reduces total scan time when multi-core hardware is available. For security analysts and research laboratories, this suggests that hardware upgrades combined with newer EMBA versions can meaningfully improve throughput. Additionally, firmware size alone should not be used to estimate analysis time, as structural complexity plays a dominant role.
The correlation between firmware size and scan times also provided additional information. The data show that scan times are not strictly linear with firmware size. For example, the largest firmware sample, S3008.bin (40.8 MB), completed its scan on PC1 in 13 h, 20 min, and 41 s, which is significantly shorter than the scan time of 3 days, 13 h, 1 min, and 36 s for the much smaller T8516.bin (7.04 MB). This finding shows how scan durations are impacted more by the configuration and intricacy of the firmware than by the sheer size of the firmware. The number of scan files, the amount of compression employed, the number of embedded elements, or other components are likely to greatly influence scan durations.
The enhanced runtime efficiency on PC2 reflects both the hardware improvements and the concurrency features introduced in version 1.4.2. For example, the T8516.bin sample’s (7.04 MB) scan time decreased from 3 days, 13 h, 1 min, and 36 s on PC1 (v1.4.1) to 2 days, 1 h, 42 min, and 22 s on PC2 (v1.4.2), while that of the medium-sized T8705.bin (25.5 MB) decreased from 2 days, 20 h, 22 min, and 12 s to 1 day, 11 h, 11 min, and 31 s. The largest sample, S3008.bin (40.8 MB), also showed a reduction in scan time from 13 h, 20 min, and 41 s to 7 h, 12 min, and 45 s. These results indicate that the combination of software improvements in v1.4.2 and the 8-core architecture of PC2, particularly the ability to execute modules concurrently, contributed to decreased runtimes. They also suggest that firmware complexity, rather than size alone, plays a significant role in determining scan duration, as smaller firmware with complex structures may require disproportionately longer analysis times. Future work could further examine the impact of firmware architecture and module-specific processing on runtime differences.
5.2. Comparison of EMBA Versions 1.4.2 and 1.5.0
The comparison of EMBA versions 1.4.2 and 1.5.0 shows important differences in runtime performance and module execution, tested on PC2 with identical hardware configurations. Both tests ran on a system with an 8-core processor and 32 GB of RAM. The evaluation included three firmware samples of varying sizes: WR940.bin (3.87 MB), T8705.bin (25.5 MB), and R8000.bin (30.2 MB). A key distinction between the two versions was the number of executed modules: EMBA 1.4.2 processed 60 modules, whereas version 1.5.0 executed 68 modules due to the addition of new checks and functionalities with the modified default scan settings.
As shown in
Table 3, for WR940.bin, EMBA 1.4.2 completed the scan in 17 min and 47 s, while version 1.5.0 took significantly longer at 3 h, 30 min, and 56 s. The extended runtime for the smaller firmware sample in version 1.5.0 suggests that the added modules or enhancements may have introduced more comprehensive checks, increasing the overall processing time. In contrast, for the larger firmware sample T8705.bin, version 1.5.0 exhibited a runtime improvement, reducing the scan time from 1 day, 11 h, 11 min, and 31 s in version 1.4.2 to 14 h, 20 min, and 35 s. Similarly, R8000.chk showed improved efficiency in version 1.5.0, with a runtime of 18 h, 9 min, and 40 s, down from 20 h, 32 min, and 4 s in version 1.4.2.
Inference: The increase in module count from 60 to 68 in EMBA v1.5.0 reflects an expansion of analytical coverage, which enhances detection depth for complex firmware. However, the extended runtime observed for smaller firmware samples suggests a trade-off between analytical comprehensiveness and execution efficiency. For operational environments where rapid triage is required, selective module configuration may be beneficial. Conversely, for in-depth security audits of large firmware images, version 1.5.0 provides measurable performance advantages.
The runtime improvements for larger firmware samples in version 1.5.0 can be attributed to enhanced parallel processing capabilities and optimization that enabled the efficient execution of additional modules. These improvements made better use of the multi-core architecture, allowing for faster analysis despite the increased number of modules. However, the extended runtime for WR940.bin suggests that the added modules or checks in version 1.5.0 are the factors contributing to the increased scan duration.
The increase from 60 to 68 modules in version 1.5.0 demonstrates the ongoing expansion of EMBA’s analysis capabilities, reflecting the introduction of new functionalities to enhance the scope and depth of the firmware analysis. While these enhancements improve the tool’s effectiveness for larger and more complex firmware, they may also introduce trade-offs in performance for smaller files. This analysis underscores the delicate balance between feature expansion and runtime efficiency in firmware analysis tools. While EMBA 1.5.0 showed clear improvements for larger firmware samples, the extended runtime for WR940.bin highlights the need for the further optimization of module execution strategies.
5.3. EMBA Version 1.5.0 and Azure VM Comparison
The performance of EMBA version 1.5.0 was evaluated in two different environments: a physical machine, PC2, and a virtualized instance on the Microsoft Azure cloud. Both environments were configured with identical hardware specifications, featuring 8-core processors and 32 GB of RAM with Ubuntu 22.04, ensuring consistency in the experimental setup. EMBA’s default scan configuration was modified, and 68 modules were executed on three firmware samples: WR940.bin, T8705.bin, and R8000.chk. The results showed important differences in performance between the two environments, particularly in execution times and the number of findings.
As shown in
Table 4, the physical machine PC2 generally outperformed the Azure cloud virtual machine in terms of scan duration, with shorter execution times for WR940.bin and T8705.bin. On PC2, WR940.bin was completed in 3 h, 30 min, and 56 s; T8705.bin required 14 h, 20 min, and 35 s; and R8000.chk took 18 h, 9 min, and 40 s. In contrast, the Azure cloud VM demonstrated longer processing times for WR940.bin and T8705.bin, with WR940.bin being completed in 10 h, 48 min, and 48 s and T8705.bin requiring 1 day, 1 h, 45 min, and 17 s. However, for R8000.chk, the Azure cloud VM performed faster, completing the analysis in 16 h, 53 min, and 55 s compared to PC2’s 18 h, 9 min, and 40 s. This anomaly suggests that specific workload characteristics or resource allocation strategies in the Azure cloud environment may occasionally benefit certain types of analysis. Due to budget limitations, we had limited time to conduct the experiments in the cloud, but the tests we conducted provided representative examples of performance, even if they were not comprehensive.
Inference: The comparison between standalone hardware and the Azure VM demonstrates that the execution environment significantly affects both runtime and, in certain cases, the number of findings. While cloud infrastructure offers scalability and flexibility, virtualization overhead and I/O constraints may introduce performance penalties for resource-intensive modules. Organizations planning large-scale firmware analysis pipelines should carefully evaluate cost–performance trade-offs. Dedicated on-premise hardware appears preferable for time-critical or high-volume analysis tasks, whereas cloud environments may be advantageous for elastic or distributed workloads.
While both environments produced largely comparable results in terms of findings, moderate variations were observed. For WR940.bin and T8705.bin, the number of findings was nearly identical, with only a slight difference for WR940.bin (2537 on PC2 vs. 2536 on the Azure cloud VM). However, for R8000.chk, a more noticeable variation was observed, with PC2 identifying 2901 findings compared to 3012 on the Azure cloud VM. This difference should be investigated further in future work to better understand the impact of execution environment on vulnerability detection results.
These results show the importance of the execution environment in determining the efficiency and reliability of firmware analysis. While physical hardware generally provides superior performance due to dedicated resources and lower latency, certain tasks, as evidenced by the R8000 results, may benefit from the dynamic resource allocation in cloud environments.
5.4. EMBA Version 1.4.1 with Different Firmware Samples
EMBA version 1.4.1 tests were conducted on PC1. PC1 is equipped with a 4-core processor and 32 GB of RAM. Also, predefined modifications were made to EMBA’s default scanning configuration. In addition, the tests conducted with EMBA version 1.4.1 analyzed three firmware samples, T8516, T8705, and S3008, focusing on runtime and performance across various modules. The total scan durations varied significantly, with T8516 taking approximately 85 h, T8705 requiring 68 h, and S3008 completing in about 13 h. These differences reflect the varying complexity and size of the firmware samples.
Key modules like P60_deep_extractor and P61_binwalk_extractor showed notable runtime disparities. For instance, P60_deep_extractor took 55 min for T8516, 99 min for T8705, and only 1 min 40 s for S3008. Similarly, S09_firmware_base_version_check had an extensive runtime for all samples, particularly T8516 and T8705, consuming 34 and 46 h, respectively, while S3008 required just over 5 h. Analytical modules like S99_grepit and F20_vul_aggregator also accounted for a significant portion of the runtime, demonstrating the computational intensity of detailed vulnerability assessments. The results indicate EMBA’s capacity to adapt to different firmware architectures while highlighting the variable demands imposed by different firmware complexities.
5.5. Module-Level Performance Analysis of EMBA v1.5.0 on PC2 and Azure VM
The EMBA scan results obtained from the PC2 standalone server and Azure cloud VMs, using three firmware samples (WR940, T8705, and R8000), reveal notable performance discrepancies (see
Appendix B: Detailed EMBA Module Execution Times). Both environments were configured with identical hardware specifications: 8 CPU cores, 32 GB of RAM, and at least 128 GB of free disk space. Our tests show that EMBA scan times differ notably between the two platforms. Overall, PC2 generally showed faster module completion times compared to the Azure cloud VM, with some variations in certain modules. For example, the overall scan time for WR940 was 3:30:56 (HH:MM:SS) on PC2, less than half the time required by the Azure cloud VM, which was 10:48:48. Similarly, for T8705, PC2 finished in 14:20:35, while the Azure cloud VM took 25:45:17. However, for the R8000 firmware, the Azure cloud VM was only slightly faster, with a total time of 16:53:55 versus 18:09:40 on PC2. This shows that the performance disparity between the two environments is not consistent for all firmware samples.
When examining specific modules, several anomalies were observed. For example, in the WR940 and T8705 firmware samples, S09 (firmware base version check) took significantly more time on the Azure cloud VM than on PC2, with durations like 21:15:18 on the Azure cloud VM for T8705 compared to just 10:42:58 on PC2. However, the R8000 firmware showed the reverse pattern, where the Azure cloud VM’s 2:02:43 outperformed PC2’s 3:37:27. This indicates that some modules are influenced by the specific firmware sample, which might have unique attributes that either exacerbate or mitigate the environmental performance differences. The S15 (Radare decompile checks) module showed substantial time consumption in both environments. For the R8000 firmware, the Azure cloud VM required 15:11:15, while PC2 completed the task in 16:29:21, further indicating that decompilation tasks are particularly resource-intensive and might behave differently depending on the environment.
Certain modules, like S99 (grepit), exhibited a similar trend, where the Azure cloud VMs’ execution time was almost double that of PC2 for the T8705 firmware sample. However, for R8000, the Azure cloud VM’s time of 3:49:06 was closer to PC2’s 5:02:12, suggesting that I/O and disk read/write operations in the Azure cloud environment might be contributing to these performance disparities. Modules such as S02 (UEFI_FwHunt) and S36 (lighttpd) showed relatively consistent performance across both platforms, which implies that certain types of tasks, perhaps those less reliant on raw I/O or system-level operations, are less sensitive to differences in environment. The variability in execution times for different modules and firmware samples hints at a complex interaction between the hardware environment and the nature of the tasks being executed. For example, S17 (CWE checker) and S13 (weak function check) revealed marked performance differences, especially with the R8000 firmware, where the Azure cloud VM took significantly longer than PC2.
On the other hand, certain modules like S115 (user-mode emulator) performed similarly across both platforms. These observations show that specific tasks like deep system analysis or computationally intensive processing may be more affected by the virtualization overhead in the Azure cloud. On the other hand, other tasks that are less resource-intensive or involve more straightforward processing may exhibit less of a performance gap between the two environments.
Several factors could cause these observed differences. Virtualization overhead in the Azure cloud VM environment is a likely cause for the slower performance in certain modules, as the abstraction layer could introduce latency, particularly in resource-intensive tasks such as binary analysis and decompilation. Additionally, resource contention could arise in a cloud-based setup, where virtual machines may share physical resources with other instances, affecting performance stability. Network-related delays might also play a role, especially in modules that require external communication or updates during execution. Furthermore, differences in disk I/O handling between the physical PC2 server and the Azure cloud VM may explain some of the variations, particularly for tasks that involve large-scale data processing or frequent disk access. Ultimately, while both environments offer similar raw hardware specifications, the cloud-based Azure VM environment seems to face additional challenges, likely due to the complexities of virtualization and resource management in a shared infrastructure. Modules like S99_grepit and S118_busybox_verifier exceed 15 h due to their computational complexity, such as binary decompilation, extensive text searches, and compressed system analysis. Standalone PCs handle these tasks faster due to dedicated resources and efficient local I/O. In contrast, Azure cloud VMs often face I/O bottlenecks, resource contention, and network latency, leading to longer execution times. Optimizing VM performance with faster storage, dedicated resources, and task parallelization, or using a hybrid setup combining PCs and VMs, can mitigate these delays and enhance efficiency.
The test results also indicated a difference between overall scan time and total module time. Moreover, the results provide supplementary observations into the scanning process. Overall scan time reflects the cumulative time taken by all modules when considered individually, whereas total module time (HH:MM:SS) measures the actual duration of a single EMBA scan. The observed discrepancy between these metrics, particularly in EMBA version 1.4.2, can be attributed to the concurrent execution of certain modules. Unlike version 1.4.1, which processes modules sequentially, version 1.4.2 introduced a concurrency feature that allows multiple modules to run simultaneously. Parallel processing capability reduces the effective scan duration by allowing multiple modules to run simultaneously. Consequently, while the total module time (sum of individual module execution times) remains high, the overall scan time (actual duration of a single EMBA scan) is shorter, reflecting the cumulative effect of concurrency. This highlights the impact of parallel execution on runtime efficiency in EMBA version 1.4.2 compared to the sequential processing in version 1.4.1.
6. Discussion
The results presented in the previous section provide a quantitative comparison of EMBA’s execution behavior across standalone and cloud-based platforms under controlled and repeatable conditions. Building on these findings, this section interprets the observed performance characteristics in a broader operational context, focusing on platform-specific trade-offs, deployment implications, and practical considerations for firmware security analysis. Rather than reiterating the numerical results, the discussion emphasizes how differences in the execution environment influence the scalability, cost efficiency, reproducibility, and long-term usability of EMBA in real-world settings.
6.1. Platform-Level Implications of EMBA Deployment
From a cost perspective, cloud platforms follow a pay-as-you-go model, eliminating the need for upfront hardware investment and allowing resources to be provisioned only during active use. This approach is advantageous for occasional or short-term analysis tasks but must be balanced against operational costs during sustained usage. In this study, firmware analyses were performed on a Microsoft Azure Standard D8s_v4 virtual machine (8 vCPU, 32 GB RAM) with an attached 1 TB Premium SSD, running in the Central US (San Antonio, Texas) region. Three distinct firmware images were each executed twice using EMBA, including additional time for tool setup, result inspection, and idle periods, for a total VM uptime of approximately 265 h. As of January 2026, using the on-demand hourly rate of
$0.384/hour, the compute cost was calculated as 265 × 0.384 =
$101.76, while the storage cost for the 1 TB Premium SSD was
$122.88, yielding a total infrastructure cost of approximately
$225, excluding taxes and network-related charges [
35,
36].
Furthermore, when EMBA is used on a regular basis, the one-time investment cost of a standalone server can become more economical than recurring cloud usage fees.
Based on our findings, deployment recommendations can be tailored to organizational scale. For small research labs or individual practitioners, cloud-based deployments using Azure virtual machines provide flexibility, low upfront costs, and easy access for remote or distributed teams. This approach is particularly suitable for occasional or short-term firmware analyses where elasticity and remote accessibility are key priorities. For medium-sized groups, a single high-performance standalone server may offer more consistent execution times and deterministic behavior while still maintaining moderate hardware investment and operational overhead.
For large-scale or enterprise environments with frequent and high-volume firmware analysis needs, a hybrid approach may be optimal. High-performance standalone servers ensure stable and reproducible execution, while supplementary cloud-based instances can provide elasticity for peak workloads or geographically distributed collaboration. This strategy balances execution speed, reproducibility, cost efficiency, and security and allows organizations to allocate resources dynamically based on workflow demands.
6.2. Future Research
Future research should build upon the findings of this study by further strengthening EMBA’s performance evaluation, scalability, and deployment flexibility across diverse environments. Building on the structured and reproducible evaluation framework established in this work, several forward-looking research directions can further extend the scope and applicability of firmware security analysis using EMBA. One natural extension is the inclusion of a broader firmware corpus encompassing a wider range of vendors, processor architectures, and functional domains. Future studies may incorporate additional industrial IoT platforms, automotive systems, and smart infrastructure devices to examine EMBA’s behavior across increasingly diverse embedded ecosystems.
Another important direction involves evaluating EMBA across a broader spectrum of deployment environments, including heterogeneous cloud providers, containerized infrastructures, and high-performance computing clusters. Such environments offer opportunities to analyze performance characteristics related to virtualization overhead, storage backends, and distributed execution models, enabling deeper insights into deployment trade-offs. In addition, future research may explore automated orchestration and scheduling strategies for large-scale firmware analysis, combining standalone servers and cloud-based resources within hybrid deployment models. This line of work is particularly relevant for organizations conducting continuous or high-volume firmware security assessments.
Another important research direction involves cross-platform cloud comparisons. While this study focused on Microsoft Azure, future work should extend the evaluation to additional cloud providers, such as Amazon Web Services (AWS) or Google Cloud Platform (GCP). Comparative analyses across multiple cloud environments would provide deeper insights into performance variability, cost efficiency, and deployment trade-offs, enabling informed multi-cloud or hybrid deployment strategies for firmware security analysis.
The role of high-performance hardware also warrants further investigation. Evaluating EMBA on more advanced standalone servers and compute-optimized cloud instances could help quantify performance gains and identify diminishing returns in relation to cost. Such studies would be particularly valuable for organizations seeking to balance execution speed, reproducibility, and operational expenses in large-scale firmware analysis workflows.
Expanding the diversity and scale of firmware samples represents another key avenue for future work. Including a broader range of firmware sizes, architectures, and vendor ecosystems would improve the generalizability of the findings and help uncover edge cases that may stress EMBA’s analysis pipeline. Additionally, incorporating complementary firmware analysis tools alongside EMBA could support comparative evaluations, highlighting both strengths and opportunities for targeted tool enhancements.
In addition, an important direction for future research is the evaluation of EMBArk, the web-based enterprise interface for EMBA. EMBArk introduces a centralized and system-independent approach to firmware analysis, which is especially relevant for enterprise and government environments. Future studies should assess EMBArk’s performance, scalability, and usability in large-scale deployments, focusing on centralized management, result aggregation, and auditability. Further research may also explore the integration of EMBArk with vulnerability databases, threat intelligence feeds, and Security Information and Event Management (SIEM) systems. Such integrations could enable real-time risk assessment and transform EMBArk from a visualization layer into a comprehensive firmware security management platform, supporting cloud-native and collaborative security workflows.
Overall, future work should focus on coordinated improvements across both the backend analysis engine (EMBA) and the frontend management interface (EMBArk), with the goal of achieving higher analytical rigor, improved performance, and broader applicability across varied operational and deployment contexts.
6.3. Reproducibility and Independent Validation Guidelines
To facilitate the independent validation and extension of the findings presented in this study, readers are encouraged to replicate the experimental framework under their own operational conditions. The following methodological considerations may support consistent and comparable measurements:
First, hardware and virtual machine configurations should be explicitly documented, including the CPU model, number of cores, memory allocation, storage type, and operating system version. In cloud environments, the instance type, region, pricing model (on-demand or reserved), and storage tier should also be reported.
Second, identical EMBA versions and scan configurations should be used when comparing deployment environments. Any modifications to default module selections or analysis parameters should be clearly stated to ensure methodological transparency.
Third, each firmware image should be analyzed multiple times under identical conditions to measure execution variance. Reporting execution time across repeated runs, together with variance indicators and module-level completion consistency, will strengthen statistical reliability.
Fourth, total operational cost should include not only compute time but also storage provisioning, idle runtime, data transfer, and setup overhead. The transparent documentation of pricing assumptions and billing duration is essential for meaningful cost comparison.
Fifth, firmware diversity should be considered. To extend generalizability, researchers may include firmware from multiple vendors, architectures, and size categories, allowing for an evaluation of performance scalability across heterogeneous workloads.
By adhering to the structured documentation of system configuration, execution parameters, and cost assumptions, future studies can systematically validate, compare, and extend the deployment trade-offs identified in this work. Such reproducible measurement practices help ensure that infrastructure decisions are grounded in empirical evidence rather than deployment assumptions.
7. Conclusions
This study examined the behavior of EMBA, an open-source firmware analysis framework, across standalone servers and cloud-based environments. The analysis focused on execution time, repeatability, cost implications, and operational constraints across different deployment environments. Rather than introducing EMBA as a new tool, this work examined its practical behavior and performance characteristics under controlled and reproducible testing conditions, addressing a gap in empirical evaluations of firmware analysis frameworks across heterogeneous platforms.
The experimental results demonstrate that EMBA produces stable and repeatable outcomes when identical firmware samples and configurations are used. Repeated executions within the same deployment environment yielded nearly identical execution times and identical numbers of findings, confirming that the observed performance differences were not artifacts of single-run measurements. This directly addresses concerns regarding experimental rigor and supports the reliability of the reported comparisons between platforms and EMBA versions.
Although the experiments were conducted under controlled and repeatable conditions, the evaluated firmware samples and hardware configurations represent a defined experimental scope. The findings therefore reflect comparative behavior within the tested environments rather than universal performance guarantees across all possible firmware types, architectures, or infrastructure scales. However, the detailed documentation of hardware specifications, virtual machine configurations, execution parameters, and cost calculations provides a transparent and reproducible measurement framework, supporting methodological rigor and enabling the independent validation of the reported results.
Importantly, this work provides empirical evidence that deployment environment selection has a measurable impact on the efficiency, cost, and practicality of firmware security analysis, even when the same analysis framework, firmware inputs, and configurations are used. By combining controlled experimentation, repeated measurements, and platform-specific observations, this study clarifies the operational conditions under which cloud-based or standalone deployments are more appropriate.
The results indicate that no single deployment model is universally optimal for EMBA-based firmware analysis. Instead, deployment decisions should be aligned with workload scale, analysis frequency, operational complexity, and budget constraints. Overall, the findings demonstrate that infrastructure selection for EMBA deployment should be guided by workload characteristics, analysis frequency, and organizational constraints rather than raw execution speed alone. No single deployment model universally dominates; instead, informed infrastructure planning is essential to balance performance, reproducibility, and long-term operational sustainability.
Consequently, organizations should evaluate not only performance metrics but also long-term operational effort and security requirements when selecting an execution environment for EMBA-based firmware security workflows.