A Novel Framework for Evaluating Application Performance in Distributed Systems

Gherghe, Alexandru-Lucian; Tudose, Cătălin

doi:10.3390/app152312837

Open AccessArticle

A Novel Framework for Evaluating Application Performance in Distributed Systems

by

Alexandru-Lucian Gherghe

¹ and

Cătălin Tudose

^1,2,*

¹

Faculty of Automatic Control and Computers, National University of Science and Technology POLITEHNICA Bucharest, 060042 Bucharest, Romania

²

Luxoft Romania, 020335 Bucharest, Romania

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(23), 12837; https://doi.org/10.3390/app152312837

Submission received: 21 October 2025 / Revised: 22 November 2025 / Accepted: 2 December 2025 / Published: 4 December 2025

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

This paper proposes an innovative framework for analyzing execution times in distributed systems. This evaluation framework is designed to support both software development and production monitoring by providing valuable insights into how the application’s response time is broken down by operations. In addition, it should also help to spot abnormal behavior. Due to the large popularity of the Spring Framework for enterprise systems and its mature ecosystem, the evaluation framework is designed on top of it, and implicitly functions as a Java application. The evaluation framework integrates seamlessly with the popular metrics collectors, such as Prometheus, and provides the flexibility to export to other data sinks, such as Apache Kafka, enabling the clients to fully customize the flow of metrics through the system. The evaluation framework is purposefully designed to have a very flat learning curve, coupled with minimal CPU and RAM overhead. This means it is inexpensive to integrate into both new and existing projects, as well as to run.

Keywords:

distributed systems; evaluation framework; execution time analysis; Java; Spring; metrics collector

1. Introduction

Significant transformations marked the evolution of software architecture since the introduction of early computing systems. In particular, web application architecture has suffered major structural changes. The deployment of a single, monolithic application instance in a production environment is used less and less. Traditional monolithic systems have gradually evolved into modular monoliths and, more recently, into microservice-based architectures. Deploying applications on physical hardware has been replaced with the usage of virtual machines and containerized environments, with orchestrators such as Kubernetes increasingly dominating the deployment landscape.

These architectural and infrastructural changes have emerged under the influence of growing demands in a digitalized and globalized world. Limitations in scalability and latency prohibit the use of single servers for large, geographically distributed users. The emergence of cloud service providers—such as Amazon Web Services (AWS), Google Cloud Platform, and Microsoft Azure—has also favored this transformation by enabling the deployment of scalable, fault-tolerant, and globally available applications with minimal investment. Consequently, the management of the infrastructure is now frequently delegated to cloud platforms to reduce operational burden on development teams.

Modern deployments commonly consist of hundreds or even thousands of individual services. For example, as of 2024, the multinational mobility company Bolt reported operating more than 1000 services in its production environment [1]. Working in such complex ecosystems without sufficient monitoring can lead to service degradation or systemic failure, often resulting in financial losses and diminished user satisfaction. This operational fragility has led to the increasing prominence of observability as a critical aspect of software system management.

The scientific novelty of this work lies in the introduction of a unified analysis framework that evaluates exporters across heterogeneous storage and messaging systems—RabbitMQ (https://www.rabbitmq.com, accessed 25 October 2025), SQL databases, and Cassandra (https://cassandra.apache.org/_/index.html, accessed 25 October 2025)—using a consistent experimental methodology and identical workload patterns. Existing metrics frameworks, such as Prometheus exporters or vendor-specific collectors, typically document platform-dependent behaviors, but they do not provide a cross-technology comparison based on controlled, reproducible experiments. By contrast, our approach establishes a uniform basis for quantifying exporter performance, identifying bottlenecks, and comparing architectural trade-offs across fundamentally different persistence layers. This contributes new knowledge on how internal storage mechanisms and network I/O patterns impact exporter throughput and resource usage.

While traditional metrics frameworks primarily focus on metrics exposition and scraping mechanics, they rarely offer insights into how exporter implementations interact with underlying data stores under realistic load. Our study extends prior work by systematically relating exporter behavior to the intrinsic performance characteristics of the target storage systems and by identifying conditions under which standard exporter designs degrade.

To formalize our evaluation, we adopt the following experimental hypothesis: the performance of a metrics exporter is bounded not only by its implementation but also by the architectural properties and operational constraints of the underlying storage system it targets. This hypothesis drives the design of our experiments and guides the interpretation of results.

Observability tools, such as Prometheus (https://prometheus.io, accessed 10 October 2025) and Grafana (https://grafana.com, accessed 10 October 2025), are now essential components in contemporary software development workflows. These tools allow the collection, visualization, and analysis of system metrics, while also providing alerting mechanisms that notify operators of potential anomalies. Industry surveys, such as the ones conducted by Grafana Labs, emphasize the importance of observability [2,3,4], identifying it as one of the most rapidly expanding trends in software engineering. Distributed systems grow in complexity, with the effect of a higher demand for comprehensive visibility into system behavior, as undetected faults and inefficiencies can propagate and amplify across service boundaries.

Another factor supporting the adoption of observability practices is the economic model employed by cloud infrastructure providers. Using Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS), users are billed based on their actual consumption. Consequently, performance optimizations in cloud-hosted applications can save direct costs, especially as data processing and user interactions are increasing.

The same industry surveys [2,3,4] highlight a strong preference for open-source solutions in observability stacks. However, they also reveal that system complexity and maintenance pose many challenges. Horizontal scaling—adding more resources to handle increased load rather than upgrading individual machines—strongly supports the capabilities of cloud platforms. Selecting and applying the software development methodology plays an important role [5], while distributed systems improve scalability and fault tolerance, but also reduce latency by bringing services geographically closer to users.

Performance optimization in distributed environments can lower system resource consumption and strongly reduce operational overhead. However, optimization efforts should be driven by analysis rather than intuition. Developers should adopt a scientific approach by establishing baseline performance metrics and evaluating changes in conditions that mirror production settings [6]. This methodology ensures that performance improvements are both measurable and reliable.

2. The Problem

System-level metrics such as CPU and memory utilization are generally straightforward to collect across most technology stacks, but they often fail to provide the granularity required for diagnosing performance issues in complex applications. Considering RESTful APIs or stream processing systems, measuring the total processing time per request is relatively simple. However, such coarse-grained measurements do not reveal much about the internal distribution of time across different components or operations inside the request lifecycle.

When a systemic increase in request processing time is observed, identifying the underlying cause can be particularly challenging. The source of the performance reduction may be hidden by the layered architecture of applications. Potential causes include, but are not limited to: increased database response latency, file I/O bottlenecks, network problems, or particular spikes in computational complexity. Localizing the precise source of latency may become difficult in the absence of the ability to isolate and measure the execution time of discrete functional units.

An easy approach may seem to capture the execution times at specific points in code, but the practical implementation introduces several challenges. Mixing the application code with the timing logic represents a cross-cutting concern and is generally intrusive. Simply embedding such logic can lead to code that grows and is both difficult to understand and to maintain. In the regular scenarios where the timing of an operation spans multiple classes or layers, additional complexity may arise, such as the need to modify method signatures or propagate timing data across component boundaries. These types of changes are typically undesirable in production-grade systems.

Even if accurate measurements are achieved, further challenges concern the aggregation and analysis of the resulting metrics. A common but naïve approach is to log the timing data directly. However, in high-throughput or computational-intensive systems, where fine-grained measurements are needed, the volume of generated logs may quickly become unmanageable, reducing their utility and potentially introducing performance overhead. These limitations emphasize the necessity for a dedicated tool capable of efficiently collecting, aggregating, and exposing execution time metrics in a non-intrusive and scalable manner.

3. Objectives

This paper introduces a novel framework for measuring execution times of individual operations within distributed systems. Considering the large adoption of Java and the Spring Framework for the development of enterprise applications [7], the proposed framework is implemented using this technological stack to support broad compatibility and ease of integration.

The primary requirement of the framework is interoperability with widely used metrics collection and data processing systems. Capturing execution metrics is not useful without the ability to persist and analyze them through established tools. In this context, systems such as Prometheus (https://prometheus.io, accessed 10 October 2025) [8] and Graphite (https://graphiteapp.org, accessed 25 October 2025) [9] serve as natural endpoints for time-series data collection. Also, the framework is designed to support export to several platforms, including event-driven streaming systems as Apache Kafka (https://kafka.apache.org, accessed 25 October 2025) [10] and RabbitMQ (https://www.rabbitmq.com, accessed 25 October 2025) [11], traditional relational databases, and logging systems. This flexibility allows users to integrate the framework with components very likely to already be present in their operational environments, consequently reducing both the cost and complexity of adoption.

A second core requirement is ease of use. The framework targets software developers as its primary users, and its design promotes a minimal configuration and a low integration overhead. Observability tools are often considered auxiliary to the core business logic of an application; therefore, the framework aims to reduce the time and learning curve necessary to adopt them. The integration process should be straightforward, and the instrumentation of application code should be non-intrusive. Collecting metrics should involve minimal boilerplate code and avoid cluttering the primary logic of the application.

A third requirement is low resource overhead. The framework should preserve the original CPU and memory footprint of the host application to the greatest extent possible. This requirement is mainly important in resource-constrained environments or cost-sensitive deployments, such as those utilizing container orchestration platforms or operating under strict service-level objectives. Some performance cost is unavoidable, as additional processing is required for metric collection, but the framework is designed to minimize its impact on both execution time and system utilization.

To put these constraints in context, it is useful to consider the typical resource profile of a Spring-based application. Idle memory consumption can range from tens to hundreds of megabytes, depending on the presence of additional libraries and middleware. For instance, the default Tomcat thread pool can consume approximately 200 MB of RAM even in idle conditions. CPU usage is more variable and depends heavily on workload and hardware, making absolute measures less meaningful. Instead, relative metrics—such as the percentage increase in the usage of the CPU—are more appropriate. Latency under tens of milliseconds is typically considered acceptable, especially in web-based applications. Accordingly, the framework targets a few operational thresholds: a maximum memory overhead of 1 MB and an added sequential processing time of no more than 1 ms per measured operation.

The interpretation of the added processing time needs to be clarified in this context. Some portions of the logic of the framework necessarily execute in the critical path of the monitored operation, thereby increasing its latency. This is the segment of processing that the framework intends to minimize. Other components of the framework, such as data export or aggregation routines, may operate asynchronously or in parallel with the application logic. While the performance of these asynchronous components is also relevant, they have a lower impact on the responsiveness of the application.

4. Related Work

Observability has evolved into a central research theme in modern distributed systems, encompassing metrics, logs, and distributed traces. While industrial observability stacks have matured rapidly, the academic literature reveals ongoing challenges concerning overhead, instrumentation strategies, measurement fidelity, and architectural complexity. This section situates the present work within the broader scientific landscape, drawing on peer-reviewed studies and comparing the proposed framework with established observability tools in the Java ecosystem.

The conceptual foundations of observability in software systems have been explored in numerous studies that highlight the interplay between monitoring, tracing, and logging. Kosińska et al. [12] provide a comprehensive review of State-of-the-Art observability practices in cloud-native environments, identifying trends and highlighting the most popular tooling. Their findings emphasize the challenges of performance, efficiency, and security, and the need for lightweight instrumentation strategies.

Pathak et al. [13] introduce a timely and practical solution to the escalating cost and inefficiency of log-heavy observability pipelines in cloud-native environments, proposing SALO as an autonomous system that adaptively reduces log verbosity without compromising downstream AIOps tasks. The claimed 95% reduction in log volume, combined with maintained diagnostic quality, indicates strong potential impact on both operational cost and system efficiency.

From a more focused perspective, Janes, Li, and Lenarduzzi [14] conduct a critical comparison of open tracing tools, including Jaeger (https://www.jaegertracing.io, accessed 12 October 2025), Zipkin (https://zipkin.io, accessed 12 October 2025), and OpenTelemetry (https://opentelemetry.io, accessed 12 October 2025). Their work highlights substantial differences in data models, storage backends, propagation formats, and operational overhead, arguing that tracing systems entail non-trivial cost and integration complexity. This distinction is relevant because, while many industrial systems treat tracing as a core observability pillar, the present work intentionally limits itself to execution-time metrics, avoiding cross-service causality tracking and the associated overhead. The study covered a systematic multivocal literature review covering 30 open tracing tools, which gives the paper strong breadth and a rigorous methodology. A major strength is the fairly balanced conclusion: no single tracing tool is ideal for all use cases—each has a unique trade-off profile.

Zhuangbin Chen et al. [15] propose an advanced sampling technique using locality-sensitive hashing (LSH) to dynamically cluster traces, reducing redundant trace storage while preserving diversity. This reflects a growing trend: not just sampling whole traces, but intelligently biasing sampling based on similarity and novelty, which helps reduce costs without losing signal.

The study from [15] is extended in [16], introducing a novel compression structure (Span Retrieval Tree) that avoids naive sampling by compressing trace data and reconstructing full traces on the backend. This indicates a shift in research from just dropping data (sampling) to compressing and deduplicating trace data—promising for reducing storage overhead while keeping trace fidelity.

Yulun Wu et al. [17] present a method called Autoscope—it uses static code analysis to decide which spans to keep, preserving logical structure while reducing volume by ~81%. This is important: rather than dropping complete traces, you can drop parts of traces (rare or less “important” spans) in a smarter way.

Haiyu Huang et al. [18] propose a framework (“Mint”) that captures all requests but drastically reduces trace volume by separating common patterns from variable parts. They show very strong storage and network savings (trace size to ~2.7%) while preserving essential trace semantics.

Linh-An Phan et al. [19] introduce a zero-code instrumentation tracing system using eBPF to infer and correlate spans across threads and services without modifying the application source. This is especially useful for microservices where instrumenting each component manually is expensive or risky; the method is practical and claims high correlation accuracy.

4.1. Observability Overhead

A substantial body of scientific research has examined the performance overhead associated with software instrumentation and monitoring, highlighting that even lightweight observability mechanisms can impose non-trivial runtime costs.

Building on this line of inquiry, Reichelt et al. [20] conducted a comparative study of modern instrumentation frameworks, evaluating bytecode manipulation, aspect weaving, and sampling-based approaches. Their findings showed that overhead varies considerably across frameworks and depends heavily on how instrumentation is implemented—whether through synchronous event hooks, asynchronous reporting, or dynamic bytecode injection. Importantly, they observed that instrumentation overhead does not scale linearly with event frequency; instead, bottlenecks often arise from queue contention, memory allocation patterns, and garbage collection pressure. These observations reflect several of the challenges addressed in the present work, particularly regarding exporter queue growth and threaded exporter behavior.

A recent study by Hammad et al. [21] provides a detailed empirical assessment of instrumentation overhead in containerised microservices. Their experiments show that even relatively lightweight code instrumentation can degrade tail latency and reduce overall throughput when applied to services deployed in container-based environments. Importantly, they demonstrate that the interaction between microservice deployment patterns (such as fine-grained decomposition and high request rates) and instrumentation frequency amplifies overhead effects. The study further highlights that the degree of instrumentation—particularly when applied at high granularity—has a direct and measurable impact on response-time variability.

Borges and Werner [22] examine a real and often underestimated problem in microservice observability: the ad hoc and error-prone nature of instrumentation and configuration decisions. Its contribution—integrating the OXN observability experiment tool into a structured method for continuous observability assurance—provides a practical and systematic approach.

Anders Nõu et al. [23] perform a detailed measurement study of tracing overhead (latency, throughput) in microservices and serverless environments, focusing on OpenTelemetry and Elastic APM (https://www.elastic.co/docs/solutions/observability/apm, accessed 14 October 2025). They identify serialization and export of trace data as major contributors to performance degradation, which is very relevant for production observability trade-offs.

A substantial body of research has also focused on techniques for reducing the overhead introduced by instrumentation-based monitoring. A notable example is the work of Kühne, Hasselbring, and Reichelt [24], who present a systematic analysis of the performance costs that arise in Kieker’s monitoring pipeline and propose concrete optimizations targeting different stages of the data-collection process. Rather than relying on dynamic weaving through AspectJ (https://eclipse.dev/aspectj/, accessed 15 October 2025)—which incurs additional join point dispatch and stack manipulation—they show that injecting monitoring code directly into the source substantially reduces the per-invocation cost of probe execution. Further reductions are achieved by limiting the amount of metadata stored in monitoring records, replacing the default LinkedBlockingQueue with a more lightweight circular buffer, and aggregating multiple measurement events before writing them to persistent storage. Their evaluation demonstrates that, when all optimizations are applied, the average overhead per monitored method invocation decreases from approximately 4.77 µs to 0.39 µs—an order-of-magnitude improvement. This work provides strong empirical evidence that monitoring overhead can be significantly reduced by redesigning instrumentation and buffering mechanisms, and it offers important conceptual parallels to the present framework’s emphasis on minimizing the cost of metric generation and export. In particular, their findings regarding queue behavior, metadata minimization, and aggregation directly relate to the bottlenecks observed in our evaluation, underscoring the broader need for lightweight, predictable, and bounded-overhead monitoring strategies.

Based on the examined sources, the identified key problems and trends are:

Smarter Sampling and Compression: There is a clear shift towards more advanced techniques—not just sampling entire traces, but compressing or performing span-level decisions—to reduce overhead without losing important data.
Multiplexed Backends: The backend ecosystem is more diverse: newer OpenTelemetry-native solutions (like Uptrace (https://uptrace.dev, accessed 19 November 2025), OpenObserve (https://openobserve.ai, accessed 19 November 2025)) are gaining traction alongside more established ones (Jaeger, Tempo (https://grafana.com/oss/tempo/, accessed 10 October 2025)).
Cost and Scale Awareness: Cost remains a huge concern in industry (per Grafana surveys [2,3,4]), driving adoption of more efficient storage (compression, sampling) and open-source stacks.
Adoption Gap: While OpenTelemetry is very popular, not all organizations are fully tracing: many still do not emit traces from all business-critical workloads.
Performance Trade-Offs Matter: Empirical studies show that tracing impacts latency/throughput; hence, any design has to weigh observability value vs. runtime cost.

4.2. Metrics Collection

Micrometer is the de facto metrics abstraction layer for the Spring ecosystem, offering unified access to a variety of backends (Prometheus, Graphite, Datadog). Its strengths lie in standardized JVM/runtime metrics, HTTP request timing, and integration with Spring Boot Actuator. Very fine-grained developer-defined timings are possible, but require manual timers or custom interceptors, which increases boilerplate. Moreover, Micrometer does not provide general-purpose instrumentation for arbitrary code segments spanning multiple classes, which the proposed framework explicitly targets.

OpenTelemetry has emerged as a standard framework for observability, aiming to unify and standardize the instrumentation and collection of telemetry data across platforms and vendors. It abstracts the observability logic of the application from specific backend implementations, reducing the risk of vendor lock-in. OpenTelemetry can be instrumented automatically or manually, each option coming with advantages and shortcomings. Automatic instrumentation is easy to set up but lacks customization. Manual instrumentation enables the possibility for customization but comes with the need to write boilerplate code.

Elastic APM is a full-stack observability solution developed as part of the Elastic ecosystem, offering automatic instrumentation, distributed tracing, metrics collection, and application-level diagnostics. The Java agent—implemented using bytecode instrumentation via the Java Instrumentation API—injects probes into common frameworks (Spring, JDBC, servlet containers, HTTP clients) to capture spans and contextual metadata with minimal configuration effort. Elastic APM is designed to work within the Elastic stack, so it introduces vendor lock-in. This is something we are trying to avoid, offering integrations with multiple observability backends and storage systems.

4.3. Positioning of This Paper

The objective of the proposed framework is not to replace or compete with established observability ecosystems such as OpenTelemetry, Jaeger, or Zipkin, but rather to introduce an abstraction layer that enables seamless, plug-and-play integration with these and other monitoring backends. Whereas existing solutions focus primarily on distributed tracing, standardized telemetry formats, and cross-platform interoperability, the present framework targets fine-grained, developer-defined execution-time measurements with minimal configuration overhead. In this sense, it is complementary to existing systems, offering a lightweight mechanism for capturing micro-level performance metrics that can be exported into broader observability pipelines with minimal coupling.

5. System Design

5.1. Overview

Java maintains backwards compatibility but generally lacks forward compatibility. Therefore, selecting an appropriate target version is critical: it must be modern enough to include essential developer features, but also sufficiently established to ensure integration with a wide array of existing applications. Within the Java ecosystem, it is advisable to use Long-Term Support (LTS) versions [25], narrowing the choices to Java 8, 11, 17, and 21. Versions prior to Java 17 are excluded, as Spring Framework 6 requires Java 17 or higher [26]. On the other hand, Java 21, being around 2 years old, has limited enterprise adoption, as many organizations have not yet completed migration. Consequently, Java 17 represents an optimal balance.

The library will be developed targeting Java 17 and specifically optimized for integration with the Spring Framework. It will be distributed as a set of JAR [27] artifacts compatible with mainstream build tools such as Maven (https://maven.apache.org, accessed on 15 October 2025) [28] and Gradle (https://gradle.org, accessed 15 October 2025) [29]. The configuration will be minimized, limited to declaring dependencies and specifying properties, so as to avoid tight coupling between client code and library internals.

To enforce loose coupling, principles such as Aspect-Oriented Programming (AOP) and Spring’s Dependency Injection [30] will be leveraged. Using the Spring context requires the registration of proper components and the creation of a dedicated configuration class, which client applications must import.

5.2. Aspect-Oriented Programming

Aspect-Oriented Programming (AOP) [31] addresses cross-cutting concerns, which impact multiple distinct parts of an application. If an application is divided into horizontal layers, cross-cutting concerns form vertical slices across these layers. Examples of cross-cutting functionality include logging and exception handling, orthogonal to the core business logic.

Traditional procedural and object-oriented paradigms lack efficient mechanisms for managing cross-cutting concerns, often resulting in code scattering and tangling. AOP mitigates these challenges through weaving techniques.

An initial approach introduced AOP using annotations (e.g., @TimedMethod) to automate method execution timing. Both Spring AOP [32] and AspectJ [33] may be used for this purpose.

AspectJ, a prominent open-source extension maintained by the Eclipse Foundation, introduces join points [34] representing program execution points such as method calls or field accesses. AspectJ supports compile-time, binary (bytecode), and load-time weaving. Compile-time weaving requires a specialized compiler that integrates aspect code into application classes. Binary weaving modifies the compiled JARs. Load-time weaving employs a custom class loader [35] for runtime weaving.

Despite performance advantages, AspectJ comes with several setup complexities, including special compilers and class loaders, which may hinder adoption. Also, its integration with Spring beans can be cumbersome. Consequently, this complexity renders AspectJ unsuitable for a user-friendly library.

In contrast, Spring AOP offers easy integration with Spring applications [36]. It shares core AOP semantics such as join points, pointcuts, and advices [37], but employs proxy-based weaving rather than bytecode manipulation. Proxies, implementing the same interfaces as target beans, intercept method calls to execute additional logic before or after the invocation of a method. This proxy mechanism simplifies the configuration but introduces limitations: it only intercepts method calls, excludes the field access interception, and cannot intercept self-invocation within the same class [38].

These constraints are significant. The inability to intercept private methods and self-invocations reduces the ability to fully monitor performance, which is critical for this library. The @TimedMethod annotation may remain an additional feature, while the enumerated limitations require alternative approaches.

Finally, AOP fundamentally restricts timing operations to entire method boundaries, conflicting with the library’s goal of enabling fine-grained, arbitrarily scoped performance measurement.

5.3. Class Structure

The metrics reporting system includes two main components: the Metrics Reporter and the Metrics Exporter. The Metrics Reporter is responsible for collecting performance data across the application, while the Metrics Exporter exposes this data to various aggregators and communication channels. Manual timing tasks must be removed; this responsibility falls to the Metrics Reporter.

The Metrics Reporter is implemented as a Spring bean offering a simple interface for metric collection, forwarding data to the exporter. The Metrics Exporter is designed as a modular, plug-and-play component adaptable to different backend solutions. A selection of exporters will be provided, with extensibility for custom implementations.

To maintain modularity and prevent circular dependencies, the dependency inversion principle [39] is applied: exporters depend on the reporter interface, rather than vice versa (Figure 1).

5.4. The Metrics Reporter

The MetricsReporter interface aims for simplicity and minimal developer overhead, exposing 2 primary methods: one to start an operation and another to end it. An operation is a unit of work defined by the developer, potentially spanning multiple lines of code and classes. Each operation instance requires a unique identification to measure execution time precisely.

The initial signature of the methods void startOperation(String operationName) and void endOperation(String operationName) is enough for single-threaded scenarios, but fails in concurrent contexts due to ambiguity in matching operation instances. If multiple threads execute the same operation in parallel, relying solely on the operation name makes it impossible to distinguish between different invocations. This can lead to incorrect measurements, such as ending the wrong operation instance or aggregating unrelated timings. To resolve this, each operation instance must receive a unique identifier.

The choice of the identifier creation is a trade-off: if the reporter generates it, the uniqueness is guaranteed, but the developers must propagate the identifier throughout the operation, complicating the code. Alternatively, developers may supply identifiers, which may already exist intrinsically within the domain objects (e.g., user IDs). This approach simplifies the usage at the expense of incorrect identifiers, leading to inaccurate metrics and misuse.

For clarity, the identifier type is standardized as a String. The user can choose whether to generate the identifier randomly or use an intrinsic identifier. Methods are thus refined to: void startOperation(String operationName, String operationId) and void endOperation(String operationName, String operationId).

Upon the invocation of startOperation, the reporter records the current time. Correctness demands the use of a monotonic clock to avoid skew-related errors. System.currentTimeMillis(), representing wall-clock time, is vulnerable to system clock adjustments [40,41]. Conversely, System.nanoTime() provides a monotonic, high-resolution time source [41], albeit at a higher computational cost.

A concurrent data structure (ConcurrentHashMap) is employed to track active operations efficiently and safely in multi-threaded environments. The map key combines the operation name and identifier, mapping to the recorded start time.

Time is captured immediately upon method entry, preceding parameter validation, ensuring accuracy.

Exception handling covers two scenarios: ending a non-existent operation and starting an already running instance. The former results in a null retrieval from the map, potentially causing NullPointerExceptions. To avoid disrupting core application logic, such exceptions are caught, and a warning is logged. The latter overwrites the existing start time, risking silent data corruption; this is similarly logged at the WARN level.

Currently, metrics export is performed synchronously on the application thread, risking performance degradation if exporters conduct costly I/O operations. To mitigate this, exporting is offloaded to a dedicated SingleThreadExecutor [42]. Metrics submission and cache management occur on this thread, eliminating synchronization bottlenecks and isolating exporter latency.

This approach introduces new challenges: heavy concurrent workloads may overwhelm the single exporter thread, causing export delays, memory bloat due to queued metrics, and delayed application shutdown. Future improvements may involve multi-threaded exporters.

5.5. Caching Statistics

Immediate metric export imposes a continuous system load. Caching metrics and exporting them periodically addresses this, but introduces configuration complexity.

Export intervals must be developer-configurable, as no universal setting fits all applications. A minimum interval of 500 ms is enforced; no maximum limit is imposed.

Caching size limits complement time-based export triggers. A maximum cache size, configurable between 1 and 1000 entries, ensures memory consumption remains bounded. Metrics are exported when either the time interval elapses or the cache size limit is reached.

Upon operation completion, elapsed time data is encapsulated and added to the cache, which is cleared after each export.

To prevent loss of cached metrics on application shutdown, export logic is integrated into the Spring lifecycle via the DisposableBean interface [43]. The destroy method triggers a final export upon application context termination.

5.6. Addressing the Impact of Garbage Collection Pauses

The Garbage Collection mechanism present in Java can introduce non-deterministic latency during memory reclamation cycles. Specifics differ between garbage collectors, but pauses in the range of tens of milliseconds can be expected. This does pose problems for a framework aiming to measure even operations in the single-digit millisecond range.

Besides being a purely technical issue, this is also part of the realm of user needs. GC pause time being part of the measurements is not necessarily wrong, as it directly depicts the “real” time it took for the operation to complete, as perceived by an outside observer.

GC pause time varies a lot between different JVMs and garbage collector implementations; hence, users can expect non-deterministic results. As a general recommendation, if the latency of the operations is critical, users should consider using the Z Garbage Collector.

Another factor that could, in theory, introduce non-determinism is the Just In Time Compiler (JIT), but we consider it to be a negligible factor in the context of this framework due to the relatively stable execution state achieved after the initial warm-up phase.

6. Exporters

6.1. Apache Kafka Exporter

6.1.1. Overview and Architecture

Apache Kafka is a distributed event streaming platform developed and maintained by the Apache Software Foundation. Its features include high throughput, fault tolerance, scalability, and robust integration capabilities, and it is suitable for deployment in containerized environments, on bare-metal systems, and in cloud-based infrastructures [44,45].

The core concept of Kafka is the event, which is essentially a stream of bytes. Events are generated by producers and consumed by consumers, both of which interact with Kafka brokers, as illustrated in Figure 2.

Events are categorized into topics, which are logical groupings of related data. For instance, a financial application may use separate topics for account creation and payment transactions. Each topic is divided into partitions, which are immutable, append-only logs that facilitate parallel processing and scalability.

Kafka ensures resilience via replication. Each partition has a leader and one or more followers. The leader handles all read and write operations, while followers replicate its data. In case of failure, a follower may be elected as the new leader using a consensus algorithm.

Producers determine which partition an event is written to, typically based on a round-robin approach or a partitioning key [10]. On the consumer side, consumer groups allow for horizontal scaling, with the following properties:

A single partition may only be consumed by one consumer within a group.
Events within a partition are strictly ordered, and no guarantees exist across partitions.

Kafka tracks consumer progress through offsets, which record the last acknowledged message per partition. This enables consumers to resume accurately in the case of interruption.

Kafka was selected as an exporter for this project due to the following attributes:

Flexibility: Users can define message formats and decide how to process consumed events—either through custom software or integrations like Apache Flink.
Performance: Kafka demonstrates high throughput and low latency [46], making it suitable for high-frequency metric reporting.
Widespread Adoption: Kafka is used by over 80% of Fortune 100 companies, which facilitates adoption and integration in existing infrastructures [47].

6.1.2. Implementation

The Kafka exporter is implemented as a Maven module: com.alexg.performance:kafka-exporter.

It depends on the com.alexg.performance: timer module and the Kafka client library org.apache.kafka:kafka-clients.

The KafkaExporter class extends PerformanceExporter and initializes a Kafka producer instance. For correct operation, users must:

Register the KafkaExporter as a Spring-managed bean.
Define configuration properties:
○
kafka.monitoring.bootstrap.servers: the Kafka bootstrap server addresses.
○
kafka.monitoring.topic: the topic to which metrics are published.

The topic must preexist, as many Kafka clusters restrict automatic topic creation.

6.1.3. Experimental Setup

Traditionally, Kafka required Apache Zookeeper for cluster orchestration. Ever since version 3.1, Kafka has introduced the KRaft mode, meant to replace Zookeeper. Zookeeper was deprecated in release 3.5, and the support was dropped entirely in release 4.0. Nevertheless, this experimental setup uses Zookeeper.

Kafka and Zookeeper are deployed in Docker containers using the confluentinc/cp-kafka and confluentinc/cp-zookeeper images. A single broker configuration with plaintext communication is employed. Topic creation is scripted to run at container startup, and a health check mechanism ensures the application does not start until the broker is available.

For event inspection, Kafka UI is deployed in a separate container.

6.2. Prometheus Exporter

6.2.1. Overview and Architecture

Prometheus (https://prometheus.io, accessed 10 October 2025) is an open-source monitoring and alerting system, originally developed by SoundCloud and later adopted by the Cloud Native Computing Foundation (CNCF) [8]. Prometheus is well-suited for observability in distributed systems due to its scalability, simplicity, and native support for multi-dimensional time-series data.

Unlike Kafka, which relies on a push-based model, Prometheus uses a pull-based approach. It periodically scrapes metrics from configured targets via HTTP. These metrics are stored in a time-series database and queried via the Prometheus UI or third-party tools such as Grafana.

Queries are written using PromQL [48], a purpose-built language for time-series data.

The Prometheus ecosystem consists of:

Prometheus Server: It is the central component with the time-series database and HTTP scraping service.
Alertmanager: Manages alert routing and notifications based on PromQL queries.
Web UI: Provides interactive data exploration.
Pushgateway (optional): Allows short-lived services to expose metrics, though its use is discouraged by the Prometheus maintainers [49].

6.2.2. Implementation

Prometheus supports several metric types [50]: counter, gauge, summary, and histogram. From these, the gauge and histogram are most appropriate for capturing performance metrics.

Gauge: Captures the latest value at the time of a scrape, suitable for instantaneous measurements but susceptible to missing outliers.
Histogram: Aggregates values into configurable buckets and stores cumulative counts and sums. Histograms are best used for the analysis of execution durations.

Given the nature of metric collection via scraping, some values may be lost if the application terminates before a scheduled scrape. This data loss is illustrated in Figure 3.

To expose metrics, the exporter starts an embedded HTTP server, which serves data on the/metrics endpoint.

6.2.3. Experimental Setup

Prometheus is deployed using the official Docker (https://www.docker.com, accessed 13 October 2025) image (prom/prometheus:latest). Scraping targets and intervals are specified via a YAML configuration file mounted into the container. The web interface is exposed on host port 9000.

Each test application instance exposes metrics on a unique port (8082 and 8083) due to local conflicts. Prometheus is configured to scrape these endpoints every 5 s and verify their health status.

For visualization, Prometheus is paired with Grafana (grafana/grafana:latest). Dashboards are persisted via Docker volumes. The Grafana dashboard includes:

Two heatmaps showing the evolution of request time distributions.
A comparative panel for the two test instances.
A liveliness monitor for service availability.

The architecture validation imposes a series of tests to prove the ability of the created application to adapt and deliver an available service even under harsh conditions.

6.3. RabbitMQ Exporter

6.3.1. Architectural Overview

RabbitMQ is a widely adopted open-source message broker that implements the Advanced Message Queuing Protocol (AMQP) [51]. It serves as a fundamental component in distributed systems, facilitating asynchronous communication through a message-driven architecture. The core architectural entities in RabbitMQ include Producers, Exchanges, Queues, and Consumers.

Producers are responsible for publishing messages to the message broker. These messages are not sent directly to queues; instead, they are initially directed to Exchanges, which act as routing intermediaries. Exchanges determine the routing of messages based on their type and routing configuration.

RabbitMQ supports 4 types of exchanges:

Direct exchanges perform exact routing based on the routing key of the message.
Topic exchanges enable pattern-based routing, providing flexible matching of routing keys.
Fanout exchanges disregard the routing key entirely and broadcast messages to all bound queues.
Headers exchanges route messages based on the attributes in the message headers instead of the routing keys.

Queues serve as the primary message buffers, temporarily storing messages until they are retrieved and acknowledged by consumers. Queues may be transient (memory-based and non-persistent) or durable (persisted to disk and resilient across broker restarts).

Consumers are applications that subscribe to queues to retrieve and process messages. Unlike the pull-based approach of Kafka, RabbitMQ employs a push model in which the broker actively delivers messages to consumers. Although polling is technically supported, it is discouraged due to performance implications [52]. Consumers must explicitly acknowledge messages upon successful processing. In cases of consumer failure or message rejection, unacknowledged messages are either requeued or discarded, depending on the acknowledgement mode.

6.3.2. Implementation Details

For integration with Spring-based applications, the org.springframework.amqp:spring-rabbit Maven artifact was employed. The RabbitMqPerformanceExporter class was developed to implement the PerformanceExporter interface. This class is designed to receive a RabbitTemplate instance, which is automatically provisioned by the Spring IoC container when using the library above.

Two configuration properties govern the runtime behavior of the exporter:

metrics.rabbitmq.exchange: It specifies the name of the exchange to which metric messages are dispatched.
metrics.rabbitmq.routingKey: It specifies the routing key accompanying each message.

These parameters ensure precise and flexible routing of performance metrics within the messaging infrastructure of RabbitMQ.

6.3.3. Experimental Setup

The RabbitMQ broker was deployed using the official Docker image rabbitmq:3-management, which exposes both the AMQP port (5672) and the management console on port 15672. These were mapped directly to the host machine to facilitate configuration and monitoring.

A configuration file defining the object definitions of the broker was mounted into the container, along with a JSON-based definitions file. This file provisioned a durable queue, a direct exchange, a binding between them, and a default guest user for administrative access.

To ensure that the broker is fully initialized prior to starting the containers of the test application, a health check mechanism was implemented. This mechanism executes the rabbitmqctl status command at 10-s intervals until the container reports a healthy state. Following this, the test containers are initialized.

6.4. SQL Exporter

6.4.1. Performance Optimization Strategies

The exporter periodically inserts multiple metric entries into the database, corresponding to the size of the internal metric cache. Performing individual insert operations for each entry would result in significant network overhead due to repeated round trips.

To address this, two optimization alternatives were evaluated:

Bulk Inserts: Constructing a single SQL statement containing multiple value tuples (Figure 4, top).
Batching: Using JDBC batching to group multiple PreparedStatement executions into a single round-trip (Figure 4, bottom).

Although bulk inserts are straightforward to implement, they are subject to limitations imposed by RDBMS vendors. For example, Microsoft SQL Server limits insert statements to 1000 rows. Consequently, if the cache size exceeds this threshold, execution failure may occur. Therefore, batching was selected as the preferred approach. Empirical studies [53] have shown that batch sizes of 50 yield a substantial reduction in insertion latency while maintaining robustness.

Another performance consideration involves database connection management. Establishing a new connection requires both TCP and TLS handshakes, which, though individually fast, can accumulate substantial latency when repeated. Connection pooling mitigates this by maintaining a pool of pre-established connections, reducing connection overhead.

Popular pooling libraries such as HikariCP are commonly used in conjunction with frameworks like Hibernate. Nevertheless, in this implementation, connection pooling was intentionally avoided. The exporter operates on a single thread and maintains a relatively small number of connections. Moreover, pooled connections could impair the performance of the hosting application, particularly if the pool size is tightly constrained.

Although integrating with an existing application-level pool is theoretically feasible, it risks overutilizing shared resources. As a compromise, a static Connection object could be retained within the exporter for its entire lifecycle. However, this approach introduces state management complexity and potential long-lived resource leaks. As long as the exporter performs adequately without pooling, additional complexity will be avoided.

6.4.2. Implementation Details

The SQL-based exporter is implemented within a Maven module identified as com.alexg.performance:sql-exporter. This module declares a dependency on org.springframework.boot:spring-boot-starter-data-jpa to facilitate database connectivity via the Java Persistence API (JPA). Additionally, it depends on the com.alexg.performance:performance-timer module, which provides the foundational PerformanceExporter interface.

Within the module, the primary component is a class named SqlExporter, which implements the PerformanceExporter interface. The constructor of SqlExporter requires three parameters: a DataSource object, a string denoting the application instance name, and another string specifying the name of the metrics table. These parameters can be injected by Spring’s dependency injection mechanism. The DataSource is automatically instantiated by Spring when the JPA starter is present in the classpath and when the appropriate connection properties are defined in the application’s configuration file. The instance name and table name must be explicitly provided in the properties file under keys such as instance.name and db.metrics.table.

Internally, SqlExporter employs a PreparedStatement to persist metric data into the specified table. The SQL statement includes three placeholders corresponding to the following parameters: an operation name (String), the execution time in nanoseconds (Long), and the application instance identifier (String), as illustrated in Table 1. The implementation assumes the target table includes at least three columns: operation_name (text), execution_ns (numeric), and instance (text). Although additional columns are permitted, the absence of any of these three required columns results in runtime failures.

The exact column data types may vary depending on the database dialect, and users must ensure that the lengths of text columns are sufficient to store the largest possible values. Failure to do so could result in data truncation or runtime exceptions, compromising correctness.

In addition to the parameters required by the SqlExporter, Spring mandates the presence of the following properties to correctly instantiate the DataSource: spring.datasource.url, spring.datasource.username, and spring.datasource.password. The URL must conform to the JDBC format for the selected database. The JDBC driver is typically inferred from the structure of the URL.

6.4.3. Test Setup

To evaluate the functionality of the SQL exporter, PostgreSQL (version 15) was selected as the underlying database due to its widespread adoption and open-source nature. The database was deployed using a Docker container based on the postgres:15 image. Port 5432, the default PostgreSQL port, was mapped to the host to enable communication. Container initialization was configured via environment variables to set a database name (metrics_db) and provide authentication credentials.

An SQL script (init.sql) is run for the schema initialization and is authored to create the required metrics table. This script was mounted into the container at /docker-entrypoint-initdb.d/init.sql, a path recognized by the PostgreSQL image for automatic execution during startup. As a result, the database is immediately prepared to receive metrics upon application launch.

The application was configured to use the following JDBC URL: jdbc:postgresql://postgres:5432/metrics_db. This format comprises the protocol, driver, host reference, port, and database name.

To verify database insertions, there are several options: querying from a command-line interface, utilizing the database view in IntelliJ IDEA, or employing a graphical management interface such as pgAdmin. The latter was selected for its comprehensive UI and ease of use. pgAdmin was deployed in a separate Docker container using the image dpage/pgadmin4:latest, thereby avoiding local installation.

6.5. Cassandra Exporter

6.5.1. Background and Characteristics

Non-relational databases, though conceptually established for decades [54], gained prominence under the collective term “NoSQL” in the early 21st century. This term encompasses a broad spectrum of databases that diverge from the relational model, addressing the evolving scalability and flexibility demands of modern data-intensive applications.

NoSQL databases differ significantly in design and application. MongoDB (https://www.mongodb.com, accessed 19 October 2025) uses a document-oriented approach, storing data as JSON-like structures [55]. Key-value stores such as Redis (https://redis.io, accessed 19 October 2025) [56] and Memcached (https://memcached.org, accessed 19 October 2025) [57] serve high-throughput caching needs. Graph databases like Neo4j (https://neo4j.com, accessed 19 October 2025) [58] prioritize relationships, using node-edge models well-suited to GraphQL-enabled architectures [59]. Wide-column stores, including Apache Cassandra (https://cassandra.apache.org/_/index.html, accessed 25 October 2025) [60], blend row-based storage with schema flexibility, also allowing dynamic column definitions across rows.

A notable contrast between NoSQL and relational systems lies in their adherence to ACID properties. While traditional RDBMSs typically enforce transactional guarantees to maintain atomicity, consistency, isolation, and durability, NoSQL databases often relax these constraints to improve scalability. Many NoSQL systems favor eventual consistency and exhibit diverse trade-offs within the CAP theorem [61]. According to the CAP theorem, no distributed system can simultaneously achieve consistency, availability, and partition tolerance. Relational databases often support CA configurations (consistency and availability), whereas NoSQL databases—Cassandra being a notable example—typically prioritize availability and partition tolerance (AP).

Apache Cassandra, released by Facebook in 2008, exemplifies this trade-off. It achieves high availability and fault tolerance through a masterless architecture, employing a gossip protocol to synchronize cluster state across nodes. Seed nodes play a critical role in cluster initialization and stability.

Cassandra’s data model bears surface similarity to relational databases, albeit with key distinctions. Instead of databases, Cassandra uses “keyspaces,” which contain tables with loosely defined schemas. Rows within a table are identified by a composite primary key, comprising a partition key used to distribute data across nodes. The query language of Cassandra (CQL) borrows SQL-like syntax but omits features like joins, necessitating careful denormalization during data modeling to align with its distributed nature.

6.5.2. Implementation Details

The Cassandra-based metrics exporter was implemented within a dedicated Maven artifact named com.alexg.performance:cassandra-exporter. This module depends on the core performance instrumentation module com.alexg.performance:performance-timer, which provides the base interface PerformanceExporter. In addition, it includes the com.datastax.oss:java-driver-core dependency to enable programmatic interaction with a Cassandra database (Figure 5).

The primary class within this module, CassandraExporter, implements the PerformanceExporter interface. Its constructor requires several parameters essential for establishing connectivity and ensuring correct metric storage: a string representing the contact point address of the Cassandra node, an integer indicating the port number (typically 9042), a keyspace name, a table name for metric storage, and the logical identifier of the exporting application instance. These values are externally configurable and retrieved at runtime via Spring properties:

cassandra.contact.point (Cassandra host),
cassandra.port (port number),
cassandra.keyspace (Cassandra keyspace name),
cassandra.exporter.table (metrics table name), and
INSTANCE_NAME (application instance identifier).

Using these parameters, the exporter constructs a CqlSession object, which serves as the main communication channel with the Cassandra database for the lifetime of the exporter instance. For data insertion, the exporter utilizes a Cassandra-specific PreparedStatement class provided by the DataStax driver (com.datastax.oss.driver.api.core.cql.PreparedStatement). This statement is configured as an INSERT operation targeting the specified metrics table, with parameters corresponding to the operation name (text), execution duration (numeric), application instance (text), and the timestamp of insertion (temporal value). The name of the target table is dynamically resolved based on the value of the cassandra.metrics.table configuration property.

The exporter thereby enables seamless and consistent insertion of performance metrics into a Cassandra store using a parametrized, strongly typed, and precompiled query mechanism, which is robust under high-throughput workloads.

6.5.3. Test Setup

For the evaluation of the Cassandra exporter, the Apache Cassandra database (version 4.1) was deployed in a containerized environment using the official Docker image cassandra:4.1. The standard Cassandra communication port (9042) was exposed and mapped directly to the host system to facilitate external access.

To ensure proper initialization of the keyspace and schema prior to metric ingestion, a secondary container, named cassandra-provision, was introduced. This auxiliary container, also based on the cassandra:4.1 image, was designed exclusively for executing provisioning scripts. It was configured to run a shell script that repeatedly attempts to connect to the main Cassandra container using cqlsh in a loop. The delay interval between attempts is 2 s. Once a successful connection is established, the script executes a secondary CQL file that contains the statements required to create the schema, including the keyspace metrics_keyspace and the associated metrics table. The provisioning logic is initiated through the entrypoint directive of the container in the Docker Compose configuration.

This approach eliminates the need for separate container health probes, as the provisioning logic itself implicitly handles availability checks. However, the test application responsible for exporting metrics must not start prematurely, as failure to connect to a fully initialized Cassandra instance would result in runtime errors. To address this, the container orchestration logic enforces a dependency: the exporter application containers are only launched once the provisioning container reaches a terminal state of service_completed_successfully.

Once the test environment is operational, verification of metric insertion is performed using cqlsh, the native Cassandra command-line interface.

7. Testing

7.1. Test Application

To evaluate the performance of exporters, a minimal Spring Boot application was developed [62]. The application creates a fixed-size thread pool and continuously submits two operations labeled opname1 and opname2, both instrumented by the metrics collection framework. These operations simulate computational work by invoking a sleep function for a randomly determined duration. Unless otherwise specified, the sleep duration is sampled from a normal distribution with a mean of 400 ms and a standard deviation of 100 ms.

The test environment was containerized to ensure repeatability, portability, and system isolation. Docker containers were used to simulate a distributed environment, with orchestration handled via Docker Compose [63]. The test application container image is derived from openjdk:17.0.1-jdk-slim. Two instances of the application are deployed to better replicate a distributed system’s behavior. Each exporter under evaluation is deployed in its own dedicated container, as described in the relevant subsections of Chapter 6.

7.2. Measurement Accuracy

Quantifying the accuracy of execution time measurements presented significant challenges, primarily due to the difficulty of establishing a ground truth for timekeeping. One natural candidate, Thread.sleep(long millis), offers limited precision due to its dependence on operating system-level schedulers and timers [64], rendering it unsuitable as an absolute reference. Similarly, comparing the custom MetricsReporter to Java’s System.nanoTime() is infeasible, as the framework internally relies on System.nanoTime() for its own measurements.

Ultimately, the approach adopted in this study assumes that Thread.sleep() introduces a negligible error and attributes any measurement discrepancy entirely to the framework. To facilitate accuracy testing, a specialized exporter—StatisticsExporter—was implemented. This component records the number of invocations and computes the average duration reported by the framework over a fixed number of samples (n = 5000). Measurements were conducted for 3 sleep durations: 2 ms, 20 ms, and 200 ms.

Measurements were performed on 10 different AWS EC2 T3 Small instances. The results are summarized in Table 2.

The relative error decreases with increasing sleep duration: ~7% for the 2 ms reference, <1% for 20 ms, and <0.1% for 200 ms. These findings indicate that the framework’s accuracy improves substantially for operations with non-trivial execution durations. Although some observed error may stem from inaccuracies inherent in Thread.sleep(), the results are satisfactory for performance monitoring purposes, where identifying trends is more critical than achieving nanosecond-level precision.

The Mean Absolute Error is very similar for all the references, especially considering the large differences between the references. This signifies that there is an almost fixed error of about 0.13–0.14 ms. Variability is higher when measuring small references, with the standard deviation decreasing when the reference is increased.

7.3. Impact on Throughput

To assess the runtime overhead introduced by instrumentation, throughput was measured in 3 scenarios: (i) without timing instrumentation, (ii) using the legacy single-threaded MetricsReporter, and (iii) using the new asynchronous reporter. The test involved performing Thread.sleep(20 ms) sequentially across 5000 invocations. Measurements were performed on 10 different AWS EC2 T3 Small instances. The Kafka exporter served as the target backend in all scenarios. The asynchronous nature of the reporter means that the choice of exporter would not make a difference when measuring the impact on throughput, as the part of the code that runs on the same thread as the business logic only adds the metrics to a queue. Execution times are presented in Table 3.

The asynchronous reporter incurs very little noticeable overhead, with an average increase of 0.73% compared to the baseline. In contrast, the synchronous reporter introduces a measurable slowdown of 7.67% when compared to the baseline and 6.88% when compared to the asynchronous reporter, attributable to blocking operations.

The synchronous exporter produced a standard deviation an order of magnitude larger than that of the baseline, with the synchronous exporter faring even worse in this regard

7.4. Resource Consumption

In practical software engineering, every additional framework or module is evaluated against its resource cost. This cost includes (i) integration effort, (ii) maintenance overhead, and (iii) runtime resource consumption. The first 2 aspects have been addressed through the design philosophy of this framework, which emphasizes simplicity and non-invasiveness. The focus here is on evaluating the third component—specifically, the additional CPU and memory overhead introduced by the instrumentation logic.

To this end, the same Spring Boot 3.2 test application was used, with no functional logic apart from metrics collection and export. This setup serves as a proxy for real-world applications in which the framework is expected to be integrated. The test application was executed inside a Docker container on 10 different AWS EC2 T3 Medium virtual machines.

We chose EC2 because it is the main virtual machine offering of one of the biggest cloud providers, AWS. T3 Medium was chosen because the “T” instance family is the general-purpose instance family, suitable for running a regular web application. The Medium instance choice was based on its resource allocation, 2vCPU and 4 GiB of memory, which should be more than enough for our needs. As for the operating system, we chose the Amazon Linux 2034 6.1 Amazon Machine Image (AMI), the latest Amazon-provided AMI at the time of testing.

Developers can replicate the baseline (i.e., an idle Spring Boot application) on their own hardware to estimate the expected overhead introduced by the framework. CPU and memory statistics were collected using Docker’s container-level reporting tools. Since the test container was the sole active application, differences in resource usage across configurations are attributable to instrumentation.

To obtain the metrics, we created a script that runs the docker stats command every second and appends the result to a file. To retrieve that file from the EC2 instances, it was first uploaded to S3 and then downloaded to our local machines. The Docker image of our test application was uploaded to the AWS Elastic Container Registry for it to be available within the EC2 machine.

The results are presented in Figure 6 and Figure 7. In the interest of legibility, we chose only to depict the first two instances in Figure 6, Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11. The data from all the runs is available in the annexes. Across all runs, we measured memory consumption (excluding initialization in the first 60 s) to have an average of 245.45 MiB, a median of 242.70 MiB, and a standard deviation of 7.55 MiB, signaling relatively constant memory consumption. Similarly, CPU consumption had an average of 3.94%, with a median of 3.98% and a standard deviation of 1.01%.

7.4.1. Worst-Case Resource Utilization

To evaluate the upper bounds of resource consumption, a stress test was conducted under a synthetic worst-case scenario in which all exporters were active simultaneously. While such a configuration is unlikely in production deployments, it provides a conservative estimate of the exporters’ combined demand on system resources.

In this scenario, metrics were generated artificially through two continuously active threads, each submitting a metric at fixed intervals. The first configuration involved one metric submission every 400 ms per thread, while maintaining the default maximum of 10 cached metrics.

A sample of the results after running the measurements on 10 machines is presented in Figure 8 and Figure 9. The average CPU consumption was 4.27%, with a median of 3.73% and a standard deviation of 3.28%. The difference in CPU consumption is negligible, measuring 0.33%. These spikes are attributed to synchronized metric flush operations across all exporters. As seen in Figure 9, memory consumption increased from the baseline to a new average of 309.22 MiB, with a median of 306.5 MiB and a standard deviation of 13.77 MiB. This outcome was expected due to the initialization of exporter-related data structures, including caches and thread pools.

To intensify the load, the metric production interval was reduced to 40 ms. Under this configuration, the average CPU usage increased to approximately 4.59%, with a median of 4.06% and a standard deviation of 3.27% (Figure 10). This represents a difference of only 0.65 percentage points compared to the baseline. Memory usage increased slightly (Figure 11), to a new average of 321.61 MiB, with a median of 323 MiB and a standard deviation of 9.92 MiB. This plateau in memory utilization suggests that memory consumption is dominated by static components (e.g., beans, thread pools, buffer structures), and is largely invariant to the frequency of metric creation.

7.4.2. Per-Exporter Resource Profiling

To further refine the analysis, each exporter was profiled individually on the 10 EC2 T3 Small virtual machines to assess its isolated impact on resource consumption. Results are summarized in Table 4 and Table 5. The tests were run for 600 s for each scenario.

Among the evaluated exporters, the Prometheus exporter and the Logger exporter demonstrated the lowest resource consumption. For Prometheus, this efficiency is primarily due to its passive architecture: it exposes metrics via an HTTP endpoint, placing the responsibility of retrieval on the external Prometheus server via periodic scraping. This model offloads computational effort and is less resource-intensive compared to exporters that actively push metrics (e.g., Kafka, SQL, Cassandra). For the logger exporter, its efficiency comes from a lack of network calls.

It is important to emphasize that the measurements in Table 4 should be interpreted qualitatively rather than quantitatively. Actual resource usage is influenced by a variety of factors, including runtime environment, container configuration, host system architecture, and workload characteristics. Consequently, developers are advised to evaluate exporter performance in staging environments that closely mirror production deployments. This includes using realistic metric volumes and exporter configurations, reflective of expected operational conditions.

7.5. Impact of Garbage Collectors

Due to their popularity, we decided to test the potential impact of the G1 and the Z garbage collector on application throughput. We used the same methodology from Section 7.3, with 5000 20-ms operations on 10 different T3 Small virtual machines. The results were very close, with the G1 garbage collector having an average execution time of 100,779.304 ms, while the Z garbage collector had an average execution time of 100,849.783 ms. This is well within the margin of error and could be attributed to the fact that there were not a lot of objects to be collected. Every application has specific needs depending on its memory usage pattern, so it is difficult to create a good generalization. We maintain our stance that garbage collection pauses are very unlikely to affect the measurement performance because, if a GC pause happens in the middle of the operation, that delay is reflected in the operation time and should be measured.

8. Addressing the Limitations of a Single-Threaded Exporter

The single-threaded approach to metrics reporting and exporting presents a number of significant advantages. The most important thing is its conceptual simplicity and reduced likelihood of implementation defects. In software engineering, minimizing complexity is a key design principle. However, under certain operational conditions—particularly high-throughput scenarios—this design may prove inadequate. This section explores the inherent limitations of the single-threaded model and examines potential strategies for enhancing performance and scalability.

In environments characterized by high-frequency metric generation, the processing capacity of the exporter may become a bottleneck. For stable system behavior, the rate of metric production must not exceed the average export rate. Caching can temporarily mask discrepancies between production and consumption rates, but persistent imbalances will eventually exhaust available memory resources.

To illustrate this point, we consider a simplified scenario in which a single producer thread generates a metric every 10 ms, and the cache holds up to 10 entries. Assuming an export time of 10 ms per metric, the exporter will operate at full capacity, expending 100 ms to process a full cache. As long as export and production rates remain balanced, the system is stable. However, when export latency exceeds the production interval, metrics will accumulate indefinitely, eventually leading to resource exhaustion.

Such conditions pose a critical threat to application stability. Unbounded growth in queued metrics—particularly in memory-constrained environments—will culminate in out-of-memory (OOM) errors. Crucially, this failure mode is difficult to detect preemptively, as standard logging mechanisms may not capture indicators of resource pressure. From the perspective of the operator, the only observable symptom may be a gradual increase in memory usage.

Several strategies can be employed to mitigate this issue. A naïve yet effective solution involves rejecting new metrics once the internal queue of the exporter surpasses a predefined threshold. While this method guarantees bounded memory usage and is trivial to implement, it suffers from a significant drawback: loss of data. Metrics rejected under such conditions are irretrievable, which compromises the accuracy of the observability of the system. Furthermore, this behavior introduces non-determinism into monitoring, as neither the presence nor the absence of a given metric can be guaranteed.

An alternative approach involves parallelizing the exporter using a multi-threaded architecture. In theory, this should improve throughput by distributing workload across multiple threads. In practice, however, such an approach introduces considerable implementation complexity and heightened susceptibility to concurrency-related defects. Moreover, the extent of performance gains is difficult to predict, as it depends on numerous factors, including JVM behavior, hardware characteristics, the number of available database connections, and I/O latency.

The challenge of determining an optimal thread count is non-trivial. A fixed-size thread pool must be appropriately dimensioned for the target environment, a task made difficult by the heterogeneity of deployment configurations. Delegating this decision to the framework’s users via configuration parameters may alleviate the problem, but it also places the burden of tuning and performance validation on the developers integrating the framework.

Dynamic thread management, such as that offered by cached thread pools, provides an appealing compromise. Cached pools dynamically adjust the number of active threads in response to load, offering increased flexibility compared to fixed pools. However, this approach introduces new risks: excessive thread creation under sustained load can lead to memory pressure and degraded performance due to increased context switching. These considerations are particularly remarkable in I/O-bound workloads where concurrency does not always translate into improved performance, especially when backend resources (e.g., database connections) are saturated.

A more modern alternative is the use of Java’s virtual threads, introduced in version 21. Virtual threads are lightweight and well-suited for I/O-intensive workloads. They can be effectively managed using the Executors.newVirtualThreadPerTaskExecutor(), which schedules tasks onto a shared pool of platform threads [65]. This model mitigates the cost of context switching and reduces the memory overhead associated with thread management. However, adoption of this feature is currently limited by the conservative upgrade cycles of the industry. Many production systems continue to rely on Java 11 or 17, both of which remain under long-term support. As a result, reliance on Java 21 features may restrict the applicability of the framework.

To empirically evaluate the performance implications of the previously mentioned threading models, a worst-case scenario was constructed. A thread pool comprising 50 threads was tasked with generating 100,000 metrics per thread, yielding a total of 5 million operations per test container. The SQL exporter was selected as the benchmark. Under the single-threaded model, CPU utilization remained modest, with occasional spikes likely attributable to garbage collection activity (Figure 12). Memory usage surged from approximately 200 MB to 1.2 GB, as shown in Figure 13, and remained elevated. Notably, garbage collection failed to reclaim memory for an extended duration. Total export duration was approximately 25 min.

Contrary to expectations, replacing the single-threaded exporter with a fixed thread pool (2 or 5 threads), a cached thread pool, or a virtual thread executor led to application crashes due to memory exhaustion. This result appears to stem from differences in queuing behavior. In the single-threaded model, lightweight tasks (e.g., list insertions) are queued within the metrics reporter. In contrast, the multi-threaded models queue heavier operations, such as those involving database connections, directly in the exporter. Consequently, memory usage scales more aggressively with load in the latter case.

An alternative mitigation strategy involves adaptive metrics merging. In overload scenarios, incoming metrics of the same type could be aggregated (e.g., by computing mean execution times) rather than enqueued individually. This would reduce queue size but at the cost of metric fidelity. Such an approach is best regarded as a fallback mechanism, suitable only when complete observability is infeasible. Moreover, defining a principled threshold for initiating this behavior remains an open challenge.

Perhaps the most promising avenue for mitigating exporter overload is the introduction of adaptive sampling strategies. Unlike simple metric rejection or coarse aggregation, sampling provides a principled mechanism for reducing load while preserving statistical representativeness. Under sustained high-throughput conditions, the exporter could dynamically adjust the sampling rate based on internal pressure signals such as queue occupancy, export latency, or memory utilization. For example, probabilistic sampling (e.g., reservoir sampling or exponentially decaying sampling) would ensure that the collected subset remains unbiased, while tail-sampling techniques could prioritize high-latency or anomalous metrics that are most valuable for diagnostics. Sampling policies could also be stratified by metric type, allowing high-frequency, low-value metrics to be thinned more aggressively than rare or semantically critical ones. This approach offers a more controlled degradation mode compared to indiscriminate dropping, while avoiding the memory and queuing amplification effects observed in multi-threaded exporters. Defining optimal sampling triggers and quantifying the resulting accuracy–overhead trade-offs represent promising directions for future work and would provide operators with clearer guarantees under overload scenarios.

Overall, the empirical evaluations demonstrate that the proposed framework meets all initially defined objectives. While certain edge cases (e.g., multi-threaded exporter scalability) revealed potential optimization areas, the core goals of interoperability, ease of integration, and efficiency have been fully realized.

9. Conclusions and Future Directions

The primary objective of this project was to develop a lightweight and non-intrusive runtime observability framework capable of providing accurate performance measurements with minimal configuration overhead and low system resource usage. This goal has been successfully achieved. The resulting framework offers a user-effective interface that is both intuitive and minimalistic, requiring very limited setup while ensuring ease of integration into existing systems. Furthermore, it maintains compatibility with a diverse range of backend systems, including various relational and non-relational databases, monitoring platforms, and metric aggregation tools. Despite its modest resource footprint, the framework consistently delivers reliable and precise metrics, fulfilling its central role in enabling effective application monitoring in production and testing environments.

A key insight gained during the development and evaluation phases of this work pertains to the importance of empirically validating theoretical assumptions—particularly those related to concurrency, memory management, and exporter performance. A particularly illustrative example emerged during the design and testing of multi-threaded exporters. While preliminary theoretical considerations suggested that multi-threading would enhance throughput and responsiveness, empirical testing revealed an unexpected consequence: the relocation of task queuing from the metric reporters to the exporters introduced a substantial memory burden, in some cases leading to out-of-memory (OOM) conditions. This finding highlights a broader principle in systems engineering: even seemingly straightforward architectural optimizations must be rigorously tested under realistic and stress-inducing conditions, as performance trade-offs are often nonlinear and context-dependent.

Revisiting the objectives defined in Section 3, the developed framework has demonstrably achieved its intended goals. The interoperability objective was validated through multi-platform exporter implementations; ease of use was confirmed by successful integration in minimal Spring applications; and low overhead was verified through quantitative CPU and memory analysis. These results confirm that the proposed design effectively balances functionality and performance, fulfilling the research aims of this study.

While most of the identified trends originate from research on distributed tracing, several of the underlying challenges—cost, overhead, backend heterogeneity, and operational complexity—apply equally to fine-grained metrics collection. The framework proposed in this work directly addresses several of the key problems identified at the end of paragraph 4.1.

The results presented in Section 7 demonstrate a strong focus on minimizing observability overhead, a key concern in both tracing and metrics systems. The framework introduces an asynchronous reporter that reduces execution-path cost to negligible levels, with measured differences falling within the noise margin of the baseline execution time (e.g., 100,625 ms vs. 101,355 ms). This directly responds to the “performance trade-offs” trend and provides a quantified alternative to the high instrumentation cost reported in prior literature.

The architecture supports multiplexed backends by design. Exporters for Kafka, SQL, Cassandra, RabbitMQ, Prometheus, and a logger backend are implemented as interchangeable modules (Section 6). This mirrors the broader movement toward backend diversity seen in OpenTelemetry ecosystems and contributes by experimentally comparing these backends under identical workloads—an analysis largely missing from existing observability frameworks.

Cost and scalability concerns are addressed through explicit resource-profiling experiments. The paper evaluates performance primarily on x86 infrastructure, on cloud (AWS EC2), and under worst-case production-like loads. Results show bounded CPU (<2%) and stable memory usage even under aggressive metric frequencies (40 ms intervals). This supports cost-aware observability decisions by quantifying the operational footprint of each backend.

The framework reduces the adoption gap by focusing on ease of integration. Unlike OpenTelemetry, which may require boilerplate or complex interceptors, the proposed system offers a minimal two-method API (startOperation/endOperation) with clear semantics for concurrent environments (Section 5.4). This directly lowers the entry barrier for organizations that lack the capacity to deploy full tracing pipelines.

Looking ahead, there are several important directions for future improvement. The most important of them is expanding the interoperability of the framework by increasing the number of supported exporters. Enhancing compatibility with a broader array of monitoring and storage systems will significantly improve the applicability of the framework across diverse software ecosystems and deployment scenarios. Another critical enhancement concerns the implementation of a robust backpressure mechanism to manage overload conditions in exporter components. As examined in Chapter 8, such a mechanism is non-trivial to design, especially in a system prioritizing low overhead and high reliability. It will require careful architectural consideration to ensure that the system remains resilient even under conditions of extreme metric generation.

A natural extension of this work involves evaluating the framework under fully distributed microservices deployments. While the present study focused on controlled, single-host containerized environments to isolate the framework’s intrinsic overhead, real-world systems introduce additional sources of variability such as inter-node network latency, remote exporter backends, and heterogeneous runtime conditions. Future experimentation across multi-node clusters—such as Kubernetes-based deployments—would provide a more comprehensive assessment of the framework’s behavior in operational microservice architectures and further substantiate its applicability in large-scale production systems.

Ultimately, the work presented in this project demonstrates that efficient and developer-friendly observability is attainable through careful design, empirical validation, and thoughtful trade-offs. The lessons learned and the architectural foundations laid herein serve as a solid basis for ongoing enhancements aimed at achieving greater scalability, extensibility, and robustness in real-world monitoring systems.

Author Contributions

Conceptualization, A.-L.G. and C.T.; formal analysis, A.-L.G. and C.T.; investigation, A.-L.G.; methodology, A.-L.G.; software, A.-L.G.; supervision, C.T.; validation, A.-L.G. and C.T.; writing—original draft, A.-L.G.; writing—review and editing, C.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not Applicable.

Informed Consent Statement

Not Applicable.

Data Availability Statement

Data available in a publicly accessible repository. The data presented in this study are openly available in GitHub at https://github.com/LucianGherghe/observability-code-measurements (accessed on 1 October 2025).

Conflicts of Interest

Author C.T. was employed by the company Luxoft Romania. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Why Do We Use Node.JS at Bolt. Available online: https://bolt.eu/en/blog/why-we-use-node-js/ (accessed on 1 October 2025).
Observability Survey 2023. Available online: https://grafana.com/observability-survey/2023/ (accessed on 1 October 2025).
Observability Survey 2024. Available online: https://grafana.com/observability-survey/2024/ (accessed on 1 October 2025).
Observability Survey 2025. Available online: https://grafana.com/observability-survey/2025/ (accessed on 1 October 2025).
Anghel, I.I.; Calin, R.S.; Nedelea, M.L.; Stanica, I.C.; Tudose, C.; Boiangiu, C.A. Software development methodologies: A comparative analysis. UPB Sci. Bull. 2022, 83, 45–58. [Google Scholar]
Bloch, J. Effective Java; Addison-Wesley: Boston, MA, USA, 2018. [Google Scholar]
Spring Framework Overview. Available online: https://spring.io/why-spring (accessed on 1 October 2025).
Prometheus Overview. Available online: https://prometheus.io/docs/introduction/overview/ (accessed on 1 October 2025).
Graphite Overview. Available online: https://graphiteapp.org/#overview (accessed on 1 October 2025).
Apache Kafka Introduction. Available online: https://kafka.apache.org/intro (accessed on 1 October 2025).
RabbitMQ Introduction. Available online: https://www.rabbitmq.com (accessed on 1 October 2025).
Kosińska, B.B.; Konieczny, M.; Malawski, M.; Zieliński, S. Toward the Observability of Cloud-Native Applications: The Overview of the State-of-the-Art. IEEE Access 2023, 11, 73036–73052. [Google Scholar] [CrossRef]
Pathak, D.; Verma, M.; Chakraborty, A.; Kumar, H. Self Adjusting Log Observability for Cloud Native Applications. In Proceedings of the 2024 IEEE 17th International Conference on Cloud Computing (CLOUD), Shenzhen, China, 7–13 July 2024. [Google Scholar]
Janes, A.; Li, X.; Lenarduzzi, V. Open tracing tools: Overview and critical comparison. J. Syst. Softw. 2023, 204, 111793. [Google Scholar] [CrossRef]
Chen, Z.; Jiang, Z.; Su, Y.; Lyu, M.R.; Zheng, Z. Tracemesh: Scalable and Streaming Sampling for Distributed Traces. In Proceedings of the 2024 IEEE 17th International Conference on Cloud Computing (CLOUD), Shenzhen, China, 7–13 July 2024. [Google Scholar]
Chen, Z.; Pu, J.; Zheng, Z. Tracezip: Efficient Distributed Tracing via Trace Compression. In Proceedings of the 34th ACM SIGSOFT International Symposium on Software Testing and Analysis, Trondheim, Norway, 25–28 June 2025. [Google Scholar]
Wu, Y.; Yu, G.; Jiang, Z.; Li, Y.; Lyu, M.R. Trace Sampling 2.0: Code Knowledge Enhanced Span-level Sampling for Distributed Tracing. Available online: https://arxiv.org/abs/2509.13852 (accessed on 1 October 2025).
Huang, H.; Chen, C.; Chen, K.; Chen, P.; Yu, G.; He, Z.; Wang, Y.; Zhang, H.; Zhou, Q. Mint: Cost-Efficient Tracing with All Requests Collection via Commonality and Variability Analysis. In Proceedings of the 30th International Conference on Architectural Support for Programming Languages and Operating Systems—ASPLOS, Rotterdam, The Netherlands, 30 March–3 April 2025; pp. 683–697. [Google Scholar]
Phan, L.A.; Wang, M.X.; Wu, G.; Dawei, W.; Liqun, C.; Jin, L. CrossTrace: Efficient Cross-Thread and Cross-Service Span Correlation in Distributed Tracing for Microservices. Available online: https://arxiv.org/abs/2508.11342 (accessed on 1 October 2025).
Reichelt, D.G.; Bulej, L.; Jung, R.; Van Hoorn, A. Overhead Comparison of Instrumentation Frameworks. In Companion of the 15th ACM/SPEC International Conference on Performance Engineering; Association for Computing Machinery: New York, NY, USA, 2024; pp. 249–256. [Google Scholar]
Hammad, Y.; Ahmad, A.A.S.; Andras, P. An empirical study on the performance overhead of code instrumentation in containerised microservices. J. Syst. Softw. 2025, 230, 112573. [Google Scholar] [CrossRef]
Borges, M.C.; Werner, S. Continuous Observability Assurance in Cloud-Native Applications. In Proceedings of the IEEE International Conference on Software Architecture Workshops, Odense, Denmark, 31 March–4 April 2025; pp. 182–185. [Google Scholar]
Nõu, A.; Talluri, S.; Iosup, A.; Bonetta, D. Investigating Performance Overhead of Distributed Tracing in Microservices and Serverless Systems. In Proceedings of the 2025 International Conference on Performance Engineering—ICPE-Annual, Coimbra, Portugal, 15–19 April 2025; pp. 162–166. [Google Scholar]
Reichelt, D.G.; Kühne, S.; Hasselbring, W. Towards Solving the Challenge of Minimal Overhead Monitoring. In Companion of the 2023 ACM/SPEC International Conference on Performance Engineering; Association for Computing Machinery: New York, NY, USA, 2023; pp. 381–388. [Google Scholar]
Java SE Support Roadmap. Available online: https://www.oracle.com/java/technologies/java-se-support-roadmap.html (accessed on 1 October 2025).
A Java 17 and Jakarta EE 9 baseline for Spring Framework 6. Available online: https://spring.io/blog/2021/09/02/a-java-17-and-jakarta-ee-9-baseline-for-spring-framework-6 (accessed on 1 October 2025).
Jar Files. Available online: https://docs.oracle.com/javase/8/docs/technotes/guides/jar/jarGuide.html (accessed on 1 October 2025).
Maven Overview. Available online: https://maven.apache.org/what-is-maven.html (accessed on 1 October 2025).
Gradle Overview. Available online: https://docs.gradle.org/current/userguide/userguide.html (accessed on 1 October 2025).
Dependency Injection. Available online: https://docs.spring.io/spring-framework/reference/core/beans/dependencies/factory-collaborators.html (accessed on 1 October 2025).
Kiczales, G.; Lamping, J.; Mendhekar, A.; Maeda, C.; Lopes, C.V.; Loingtier, J.M.; Irwin, J. Aspect-Oriented Programming. In ECOOP ’97—Object-Oriented Programming; ECOOP 1997; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 1997; Volume 1241. [Google Scholar]
Spring AOP. Available online: https://docs.spring.io/spring-framework/reference/core/aop.html (accessed on 1 October 2025).
Laddad, R. AspectJ in Action: Enterprise AOP with Spring Applications, 2nd ed.; Manning: New York, NY, USA, 2024. [Google Scholar]
AspectJ Semantics Introduction. Available online: https://eclipse.dev/aspectj/doc/released/progguide/semantics.html#semantics-intro (accessed on 1 October 2025).
Load-Time Weaving. Available online: https://eclipse.dev/aspectj/doc/released/devguide/ltw.html (accessed on 1 October 2025).
AOP Goals. Available online: https://docs.spring.io/spring-framework/reference/core/aop/introduction-spring-defn.html (accessed on 1 October 2025).
AspectJ Support. Available online: https://docs.spring.io/spring-framework/reference/core/aop/ataspectj.html (accessed on 1 October 2025).
AOP Documentation—Proxies. Available online: https://docs.spring.io/spring-framework/reference/core/aop/proxying.html (accessed on 1 October 2025).
Sun, P.; Kim, D.-K.; Ming, H.; Lu, L. Measuring Impact of Dependency Injection on Software Maintainability. Computers 2022, 11, 141. [Google Scholar] [CrossRef]
Kleppmann, M. Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems; O’Reilly: Sebastopol, CA, USA, 2018. [Google Scholar]
System Javadoc. Available online: https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/System.html (accessed on 1 October 2025).
ConcurrentHashMap Javadoc. Available online: https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/concurrent/ConcurrentHashMap.html#%3Cinit%3E() (accessed on 1 October 2025).
Executors. Available online: https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/concurrent/Executors.html (accessed on 1 October 2025).
Spring Doc—DisposableBean. Available online: https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/beans/factory/DisposableBean.html (accessed on 1 October 2025).
What Is Event Streaming. Available online: https://www.confluent.io/learn/event-streaming/ (accessed on 1 October 2025).
Kafka Partitioning Keys. Available online: https://www.confluent.io/learn/kafka-partition-key/ (accessed on 1 October 2025).
Wu, H.; Shang, Z.; Wolter, K. Performance Prediction for the Apache Kafka Messaging System. In Proceedings of the 2019 IEEE 21st International Conference on High Performance Computing and Communications, Zhangjiajie, China, 10–12 August 2019. [Google Scholar]
Companies Using Kafka. Available online: https://kafka.apache.org/powered-by (accessed on 1 October 2025).
Querying Prometheus. Available online: https://prometheus.io/docs/prometheus/latest/querying/basics/ (accessed on 1 October 2025).
Prometheus Metric Types. Available online: https://prometheus.io/docs/concepts/metric_types/ (accessed on 1 October 2025).
OASIS Advanced Message Queuing Protocol (AMQP) Version 1.0 Part 0: Overview. Available online: https://docs.oasis-open.org/amqp/core/v1.0/os/amqp-core-overview-v1.0-os.html (accessed on 1 October 2025).
RabbitMQ Consumers. Available online: https://www.rabbitmq.com/tutorials/amqp-concepts#consumers (accessed on 1 October 2025).
Mihalcea, V. High-Performance Java Persistence; Self-Published: Cluj-Napoca, Romania, 2019. [Google Scholar]
Paterno, G. NoSQL Tutorial: A comprehensive look at the NoSQL database. Linux J. 1999, 1999, 23-es. [Google Scholar]
Chauhan, A. A review on various aspects of MongoDB databases. Int. J. Eng. Res. Technol. (IJERT) 2019, 8, 90–92. [Google Scholar]
Carlson, J. Redis in Action; Manning: New York, NY, USA, 2013. [Google Scholar]
Fitzpatrick, B. Distributed Caching with Memcached. Linux J. 2004, 2004, 5. [Google Scholar]
Guia, J.; Soares, V.G.; Bernardino, J. Graph Databases: Neo4j Analysis. In Proceedings of the 19th International Conference on Enterprise Information Systems (ICEIS 2017); SCITEPRESS—Science and Technology Publications: Setubal, Portugal, 2017; Volume 1, pp. 351–356. [Google Scholar]
Buna, S. GraphQL in Action; Manning: New York, NY, USA, 2021. [Google Scholar]
Carpenter, J.; Hewitt, E. Cassandra: The Definitive Guide, Distributed Data at Web Scale, 3rd ed.; O’Reilly Media: Sebastopol, CA, USA, 2022. [Google Scholar]
Lee, E.A.; Bateni, S.; Lin, S.; Lohstroh, M.; Menard, C. Quantifying and generalizing the CAP theorem. Available online: https://arxiv.org/abs/2109.07771 (accessed on 1 October 2025).
Spring Boot Overview. Available online: https://docs.spring.io/spring-boot/index.html (accessed on 1 October 2025).
Docker Compose. Available online: https://docs.docker.com/compose/ (accessed on 1 October 2025).
Thread Class. Available online: https://docs.oracle.com/javase/8/docs/api/java/lang/Thread.html (accessed on 1 October 2025).
Beronić, D.; Pufek, P.; Mihaljević, B.; Radovan, A. On Analyzing Virtual Threads—A Structured Concurrency Model for Scalable Applications on the JVM. In Proceedings of the 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO), Opatija, Croatia, 27 September–1 October 2021. [Google Scholar]

Figure 1. Inverting the Exporter dependency.

Figure 2. High-level architecture of Apache Kafka. Producers send events to brokers, which store them in partitioned topics. Consumers retrieve events from partitions based on offset tracking.

Figure 3. Time window showing lost metrics due to missed scrapes during application shutdown. Red Shutdown illustrates that operations are ending.

Figure 4. Single statement vs. batches.

Figure 5. Cassandra Exporter Dependencies.

Figure 6. Baseline CPU Measurements EC2 (base 10 logarithmic).

Figure 7. Baseline Memory Measurements EC2.

Figure 8. CPU Metrics for Slow Operations EC2 (base 10 logarithmic).

Figure 9. Memory Metrics for Slow Operations EC2.

Figure 10. CPU Metrics for Fast Operations EC2 (base 10 logarithmic).

Figure 11. Memory Metrics for Fast Operations EC2.

Figure 12. CPU while Stress Testing the Single-Threaded Exporter.

Figure 13. Memory while Stress Testing the Single-Threaded Exporter.

Table 1. Table definition with required columns.

db.metrics.table
Operation_name	text	Not null
Execution_ns	numeric	Not null
instance	text	Not null
[additional columns]	[column type]	[constraints]

Table 2. Accuracy Measurements.

Reference (ms)	Mean Reported Time (ms)	Median Reported Time (ms)	Standard Deviation (ms)	Mean Absolute Error (ms)	Error Percentage (%)
2	2.144	2.134	0.035	0.144	6.699
20	20.133	20.128	0.013	0.133	0.660
200	200.130	200.128	0.009	0.130	0.065

Table 3. Throughput Measurements.

State	Mean Time (ms)	Median Time (ms)	Standard Deviation (ms)
Baseline	100,625.356	100,629.916	21.871
Asynchronous Exporter	101,355.612	101,272.522	386.007
Synchronous Exporter	108,333.422	108,241.512	1072.680

Table 4. Mean CPU Consumption for Exporters.

	400 ms	40 ms
Logger	0.28%	0.72%
Kafka	0.40%	1.26%
Prometheus	0.31%	0.70%
RabbitMQ	0.38%	0.76%
SQL	0.42%	0.99%
Cassandra	0.68%	1.58%

Table 5. Standard Deviations for Exporters.

	400 ms	40 ms
Logger	0.46%	0.46%
Kafka	0.87%	2.76%
Prometheus	0.37%	0.37%
RabbitMQ	1.34%	0.23%
SQL	1.61%	0.15%
Cassandra	1.68%	1.14%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gherghe, A.-L.; Tudose, C. A Novel Framework for Evaluating Application Performance in Distributed Systems. Appl. Sci. 2025, 15, 12837. https://doi.org/10.3390/app152312837

AMA Style

Gherghe A-L, Tudose C. A Novel Framework for Evaluating Application Performance in Distributed Systems. Applied Sciences. 2025; 15(23):12837. https://doi.org/10.3390/app152312837

Chicago/Turabian Style

Gherghe, Alexandru-Lucian, and Cătălin Tudose. 2025. "A Novel Framework for Evaluating Application Performance in Distributed Systems" Applied Sciences 15, no. 23: 12837. https://doi.org/10.3390/app152312837

APA Style

Gherghe, A.-L., & Tudose, C. (2025). A Novel Framework for Evaluating Application Performance in Distributed Systems. Applied Sciences, 15(23), 12837. https://doi.org/10.3390/app152312837

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

A Novel Framework for Evaluating Application Performance in Distributed Systems

Abstract

1. Introduction

2. The Problem

3. Objectives

4. Related Work

4.1. Observability Overhead

4.2. Metrics Collection

4.3. Positioning of This Paper

5. System Design

5.1. Overview

5.2. Aspect-Oriented Programming

5.3. Class Structure

5.4. The Metrics Reporter

5.5. Caching Statistics

5.6. Addressing the Impact of Garbage Collection Pauses

6. Exporters

6.1. Apache Kafka Exporter

6.1.1. Overview and Architecture

6.1.2. Implementation

6.1.3. Experimental Setup

6.2. Prometheus Exporter

6.2.1. Overview and Architecture

6.2.2. Implementation

6.2.3. Experimental Setup

6.3. RabbitMQ Exporter

6.3.1. Architectural Overview

6.3.2. Implementation Details

6.3.3. Experimental Setup

6.4. SQL Exporter

6.4.1. Performance Optimization Strategies

6.4.2. Implementation Details

6.4.3. Test Setup

6.5. Cassandra Exporter

6.5.1. Background and Characteristics

6.5.2. Implementation Details

6.5.3. Test Setup

7. Testing

7.1. Test Application

7.2. Measurement Accuracy

7.3. Impact on Throughput

7.4. Resource Consumption

7.4.1. Worst-Case Resource Utilization

7.4.2. Per-Exporter Resource Profiling

7.5. Impact of Garbage Collectors

8. Addressing the Limitations of a Single-Threaded Exporter

9. Conclusions and Future Directions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI