You are currently viewing a new version of our website. To view the old version click .
Software
  • Article
  • Open Access

13 August 2025

Enabling Progressive Server-Side Rendering for Traditional Web Template Engines with Java Virtual Threads

and
Instituto Superior de Engenharia de Lisboa (ISEL), Polytechnical Institute of Lisbon, 1959-007 Lisbon, Portugal
*
Author to whom correspondence should be addressed.
This article belongs to the Topic Software Engineering and Applications

Abstract

Modern web applications increasingly demand rendering techniques that optimize performance, responsiveness, and scalability. Progressive Server-Side Rendering (PSSR) bridges the gap between Server-Side Rendering and Client-Side Rendering by progressively streaming HTML content, improving perceived load times. Still, traditional HTML template engines often rely on blocking interfaces that hinder their use in asynchronous, non-blocking contexts required for PSSR. This paper analyzes how Java virtual threads, introduced in Java 21, enable non-blocking execution of blocking I/O operations, allowing the reuse of traditional template engines for PSSR without complex asynchronous programming models. We benchmark multiple engines across Spring WebFlux, Spring MVC, and Quarkus using reactive, suspendable, and virtual thread-based approaches. Results show that virtual threads allow blocking engines to scale comparably to those designed for non-blocking I/O, achieving high throughput and responsiveness under load. This demonstrates that virtual threads provide a compelling path to simplify the implementation of PSSR with familiar HTML templates, significantly lowering the barrier to entry while maintaining performance.

1. Introduction

Modern web applications rely on different rendering strategies to optimize performance, user experience, and scalability. The two most dominant approaches are Server-Side Rendering (SSR) and Client-Side Rendering (CSR). SSR generates HTML content on the server before sending it to the client, resulting in a faster First Contentful Paint (FCP) [1] and better Search Engine Optimization (SEO). Nevertheless, SSR can increase server load and reduce throughput (the number of requests the server can handle per second (RPS)) since each request requires additional processing before responding. In contrast, CSR shifts the rendering workload to the browser—the server initially sends a minimal HTML document with JavaScript, which dynamically loads the page content. While CSR reduces the server’s burden, it can lead to a slower FCP, as users must wait for JavaScript execution before meaningful content appears.
Progressive Server-Side Rendering (PSSR) combines benefits from both SSR and CSR by streaming HTML content progressively. This technique enhances user-perceived performance by allowing progressive rendering as data becomes available, significantly reducing the time-to-first-byte (TTFB) and improving perceived load times compared to traditional SSR approaches [2]. In this respect, PSSR is similar to CSR in that the server initially sends a minimal HTML document to the client and subsequently streams additional HTML fragments. Even so, unlike CSR, PSSR retains all rendering responsibilities on the server side, thereby reducing the load on the client. Consequently, the client does not need to execute JavaScript or make additional requests to retrieve page content. The streaming nature of PSSR allows users to see content progressively as it becomes available, rather than waiting for the complete page to be rendered server-side, thus providing a more responsive user experience with measurably lower TTFB values [2].
Low-thread servers, also known as event-driven [3], have gained prominence in contemporary web applications due to their ability to efficiently manage a large number of concurrent I/O operations with minimal resources, thus promoting better scalability. By leveraging asynchronous I/O operations, such as database queries and API calls, servers can avoid blocking threads while waiting for data, thereby maximizing throughput and responsiveness. To support this non-blocking architecture, PSSR implementations require template engines that are compatible with asynchronous data models [4]. Some modern template engines, such as HtmlFlow [5] and Thymeleaf [6], have been designed with these capabilities in mind. However, many legacy template engines—particularly those using external domain-specific languages (DSLs) [7]—still depend on blocking interfaces like Iterable for data processing. This blocking behavior forces server threads to remain idle until the entire HTML output is ready, undermining the performance benefits of non-blocking I/O and limiting scalability in high-concurrency environments.
With the introduction of virtual threads (https://openjdk.org/jeps/444, accessed on 29 May 2025) in Java 21, it is now possible to execute blocking I/O operations in a scalable, lightweight manner. This capability allows legacy template engines—often reliant on blocking interfaces—to operate efficiently in high-concurrency, non-blocking environments without requiring complex asynchronous programming models. Due to this, PSSR can now be implemented using familiar HTML templates, simplifying development and improving maintainability.
We investigate the current landscape of non-blocking PSSR, focusing on two primary paradigms: reactive programming and coroutines, both of which have been used to achieve asynchronous I/O in the Java ecosystem. As an alternative, we investigate whether Java’s virtual threads can offer comparable performance while preserving the simplicity of synchronous code. Section 2 reviews the state-of-the-art in PSSR and template engine design. Section 3 outlines the limitations of conventional engines in asynchronous settings and presents our proposed approach. Section 4 details the benchmark methodology, followed by the results and analysis in Section 5. Section 6 compares our results with those from other studies. The conclusions of this work are presented in Section 7.

3. Problem Statement

In this section, we examine the challenges of implementing Progressive Server-Side Rendering (PSSR) in modern web applications, with a focus on the limitations of current template engine designs. Our goal is to broaden the range of options available for PSSR, particularly within JVM-based frameworks. In PSSR, the server does not wait for the entire data model to be ready before beginning to render HTML. Instead, it processes and streams each piece of data to the client as soon as it arrives.
Reactive types like Observable<T> of ReactiveX [25] or Kotlin Flow<T> [29] facilitate this by representing data as a sequence of asynchronous events. For example, a reactive stream might emit a sequence of Presentation objects, each representing a talk in a conference schedule. As each Presentation is emitted by the Observable or Flow, the internal DSL-based engine—such as HtmlFlow—can render the corresponding HTML fragment and immediately flush it to the client. This approach is demonstrated in Listing 4, where each presentation is rendered asynchronously as it is emitted by an Observable. Note that the await builder receives an additional parameter, the onCompletion callback, which is used to signal HtmlFlow that it can proceed to render the next HTML element in the web template [4]. HtmlFlow pauses the rendering process until onCompletion is called, similar to how the resume function works in continuations and coroutines [41]. In Listing 5, we show an equivalent suspend-based implementation using Kotlin’s Flow [2]. Both examples highlight how internal DSLs can natively integrate with reactive types to enable non-blocking, progressive rendering on the server side.
Listing 4. HtmlFlow reactive presentation template in Kotlin with an Observable model.
await { div, model, onCompletion ->
      model
            .doOnNext { presentation ->
                presentationFragmentAsync
                    .renderAsync(presentation)
                    .thenApply { frag -> div.raw(frag) }
                }
            .doOnComplete { onCompletion.finish() }
            .subscribe()
}
Listing 5. HtmlFlow suspend presentation template in Kotlin with a Flow model.
suspending { model ->
      model
            .toFlowable()
            .asFlow()
            .collect { presentation ->
                presentationFragmentAsync
                    .renderAsync(presentation)
                    .thenApply { frag -> raw(frag) }
            }
}
By contrast, template engines that use external DSLs—such as JStachio, Thymeleaf, or Handlebars—typically define templates within static HTML documents using custom markers, and rely on blocking interfaces like java.util.Iterable or java.util.stream.Stream. These interfaces require the entire data model, to be materialized in memory before rendering can begin, which blocks server threads during template expansion and significantly limits scalability under high concurrency. Some reactive libraries, such as RxJava, provide bridging mechanisms like Observable.blockingIterable(), which allows asynchronous data sources to be exposed as Iterable by blocking the thread until all items are available. While useful for compatibility with traditional APIs, this approach reintroduces blocking behavior and undermines the benefits of non-blocking I/O—especially under high concurrency. Listing 6 illustrates this model using a JStachio template, where the engine performs a blocking loop over presentationItems.
Listing 6. Presentation HTML template using JStachio.
{{#presentationItems}}
<div class="card mb-3 shadow-sm rounded">
        <div class="card-header">
                <h5 class="card-title">
                        {{title}} - {{speakerName}}
                </h5>
        </div>
        <div class="card-body">
                {{summary}}
        </div>
</div>
{{/presentationItems}}
Despite these performance limitations, external DSLs remain popular due to several advantages:
1.
Separation of Concerns: HTML templates are decoupled from application logic, enabling front-end developers to contribute without modifying back-end code.
2.
Cross-Language Compatibility: External DSLs are portable across languages and frameworks, easing integration in multi-language environments.
3.
Familiarity: Many developers are comfortable with HTML syntax, lowering the barrier to entry and improving maintainability.
These strengths make external DSLs appealing—even when they come at the cost of blocking synchronous rendering. However, this trade-off becomes critical under high concurrency, where blocking threads severely degrades throughput [2]. Emerging features in the Java ecosystem, particularly virtual threads introduced in Java 21 as part of Project Loom [42], offer a promising solution to this challenge. Virtual threads drastically reduce the overhead of blocking operations by decoupling thread execution from OS-level threads. For this reason, engines that rely on blocking interfaces—like those used in external DSLs—can potentially achieve scalability levels that approach those of non-blocking, asynchronous engines.

4. Benchmark Implementation

This benchmark (https://github.com/xmlet/comparing-non-blocking-progressive-ssr, accessed on 29 May 2025) is designed with a modular architecture, separating the view and model layers from the controller layer  [43], which allows for easy extension and integration of new template engines and frameworks. It also includes a set of tests to ensure the correctness of implementations and to validate the HTML output. It includes two different data models, defined as Presentation and Stock, as shown in Listings 7 and 8, following the proposal of the benchmarks [44,45]. The Presentation class represents a presentation with a title, speaker name, and summary, while the Stock class represents a stock with a name, URL, symbol, price, change, and ratio.
Listing 7. Stock class.
data class Stock(
        val name: String,
        val name2: String,
        val url: String,
        val symbol: String,
        val price: Double,
        val change: Double,
        val ratio: Double
)
Listing 8. Presentation class.
data class Presentation(
        val id: Long,
        val title: String,
        val speakerName: String,
        val summary: String
)
The application’s repository contains a list of 10 instances of the Presentation class and 20 instances of the Stock class. Each list is used to generate a respective HTML view. Although the instances are kept in memory, the repository uses the Observable class from the RxJava library to interleave list items with a delay of 1 millisecond. This delay promotes context switching and frees up the calling thread to handle other requests in non-blocking scenarios, mimicking actual I/O operations.
By using the blockingIterable method of the Observable class, we provide a blocking interface for template engines that do not support asynchronous data models, while still simulating the asynchronous nature of the data source to enable PSSR. Template engines that do not support non-blocking I/O for PSSR include KotlinX, Rocker, JStachio, Pebble, Freemarker, Trimou, and Velocity. HtmlFlow supports non-blocking I/O through suspendable templates and asynchronous rendering, while Thymeleaf enables it using the ReactiveDataDriverContextVariable in conjunction with a non-blocking Spring ViewResolver.
The aforementioned blocking template engines are used in the context of virtual threads or alternative coroutine dispatchers, allowing the handler thread to be released and reused for other requests.
The Spring WebFlux core implementation uses Project Reactor to support a reactive programming model: each method returns a Flux<String> as the response body, which acts as a publisher that progressively streams the HTML content to the client. The implementation includes four main approaches to PSSR:
  • Reactive: The template engine is used in a reactive context, where the HTML content rendered using the reactive programming model. An example of this approach is the Thymeleaf template engine when using the ReactiveDataDriverContextVariable in conjunction with a non-blocking Spring ViewResolver.
  • Suspendable: The template engine is used in a suspendable context, where the HTML content is rendered within the context of a suspending function. An example of this approach is the HtmlFlow template engine, which supports suspendable templates with the use of the Flow class from the Kotlin standard library.
  • Virtual: The template engine is used in a non-blocking context, where the HTML content is rendered within the context of virtual threads. This method is used for the template engines that do not traditionally support non-blocking I/O, be it either because they use external DSLs and in consequence only support the blocking Iterable interface, or because they do not support the asynchronous rendering of HTML content.
  • Blocking: The template engine is used in a blocking context, where the HTML content is rendered using the blocking interface of the Observable class, within the context of an OS-thread.
The Spring MVC implementation uses handlers based solely on the blocking interface of the Observable class. To enable PSSR in this context, we utilize the StreamingResponseBody interface, which allows the application to write directly to the response OutputStream without blocking the servlet container thread. According to the Spring documentation (https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/servlet/mvc/method/annotation/StreamingResponseBody.html, accessed on 29 May 2025), this class is a controller method return value type for asynchronous request processing where the application can write directly to the response OutputStream without holding up the Servlet container thread.
In Spring MVC, StreamingResponseBody enables asynchronous writing relative to the request-handling thread, but the underlying I/O remains blocking—specifically the writes to the OutputStream. When using virtual threads, the I/O operations are more efficient when compared to platform threads, as they are executed in the context of a lightweight thread. Most of the computation is performed in a separate thread from the one that receives each request; we use a thread pool TaskExecutor to process requests, allowing the application to scale and handle multiple clients more efficiently as opposed to the default TaskExecutor implementation, which tries to create a thread for each request.
However, the Spring MVC implementation does not effectively support PSSR for these templates, as HTML content is not streamed progressively to the client. This limitation occurs because the response is only flushed to the client once the content written to the OutputStream exceeds the configured output buffer size, as explained in Section 2.2. Consequently, the client receives the complete response only after the entire HTML content has been rendered and buffered, which negates the primary benefits of PSSR—namely, reduced time-to-first-byte and progressive content delivery. Furthermore, Spring MVC does not provide configuration options for the response buffer size, preventing developers from reducing it to smaller values that would enable more frequent flushing and achieve true progressive streaming of HTML content. This architectural constraint makes Spring MVC unsuitable for implementing effective PSSR compared to reactive frameworks like Spring WebFlux.
This implementation includes two main approaches to PSSR:
  • Blocking: The template engine is used in a blocking context, where the HTML content is rendered using the blocking interface of the Observable class.
  • Virtual: The template engine is used in a non-blocking context, where the HTML content is rendered within the context of virtual threads.
The Quarkus implementation also uses handlers based on the blocking interface of the Observable class. It implements the StreamingOutput interface from the JAX-RS specification to enable PSSR, allowing HTML content to be streamed to the client. While StreamingOutput also uses blocking I/O, it operates on Vert.x worker threads, which prevents blocking of the event loop. When virtual threads are used, the I/O operations are handled efficiently, as they are executed in lightweight threads.
The Quarkus implementation supports PSSR for these templates by configuring the response buffer size in the application.properties file. The default buffer size is 8 KB, but we reduced it to 512 bytes, which allows the response to be sent to the client progressively as the HTML content is rendered.
This implementation includes three main approaches to PSSR:
  • Blocking: The template engine is used in a blocking context, where the HTML content is rendered using the blocking interface of the Observable class.
  • Virtual: The template engine is used in a non-blocking context, where the HTML content is rendered within the context of virtual threads.
  • Reactive: The template engine is used in a reactive context, where the HTML content is rendered using the reactive programming model. In this case, we use the HtmlFlow template engine, which supports asynchronous rendering through the writeAsync method.

5. Results

This section presents the evaluation results organized into four main parts: Section 5.1 details the testing environment and hardware specifications; Section 5.2 presents scalability results for the Presentations data model; Section 5.3 analyzes performance with the more complex Stocks data model; and Section 5.4 provides detailed memory consumption and resource utilization analysis.

5.1. Environment Specifications

All benchmarks were conducted on a GitHub-hosted virtual machine under GitHub Actions (GitHub, Inc., San Francisco, CA, USA), running Ubuntu 24.04.2 LTS with an AMD EPYC 7763 64-Core Processor (Advanced Micro Devices, Inc., Santa Clara, CA, USA) operating at 3.24 GHz, configured with two CPU cores and two threads per core, and 7.8 GB of available RAM. The system utilizes a 75 GB SSD with ext4 file system, achieving 1.5 GB/s write throughput in basic I/O tests. Network connectivity is provided through a 1500 MTU Ethernet interface. All tests were conducted on OpenJDK 21 (Corretto build) with the G1 garbage collector enabled by default. The JVM arguments used were: -Xms512m -Xmx16g. Each test was executed five times per configuration to ensure result reliability.
All template engines were configured to use UTF-8 encoding and disable automatic HTML escaping to ensure fair comparison and consistent output across all template engines. Caching behavior was left as the default for each engine, allowing them to optimize template loading and rendering as per their design. These configurations ensure that all engines operate under equivalent conditions, with template loading, encoding, and escaping behavior normalized across implementations.
For both the Apache Bench and JMeter tests, we simulate a 1000-request warm up period for each route with a concurrent user load of 32 users. The warm up period is followed by the actual test period, during which we simulate 256 requests per user, scaling in increments up to 128 concurrent users.
The results are presented in the form of throughput (number of requests per second) for each template engine, with the x-axis representing the number of concurrent users and the y-axis representing the throughput in requests per second.
Both Quarkus and Spring MVC implementations were configured with an 8 KB output buffer size to ensure consistency, despite Quarkus not enabling PSSR at this size and Spring MVC not supporting PSSR for the tested templates. Testing with a reduced 512 B buffer size showed only a 0.046% performance difference with Rocker, indicating negligible impact. Both frameworks use unlimited platform thread pools, enabling on-demand thread creation up to system limits for maximum throughput and scalability observation under high concurrent load.
Since the obtained results for JMeter and Apache Bench show no significant differences, only the JMeter results will be presented. Statistical analysis of test configurations revealed an average absolute percentage difference of 2.83% between the two load testing tools. While individual approach differences ranged from −16.53% to +14.66% at the extremes, the consistent directional bias and small magnitude of differences indicate that both tools provide comparable performance measurements. Given this statistical equivalence and the need for brevity, only JMeter results are presented, as they provide representative performance characteristics across all tested configurations.

5.2. Scalability Results for the Presentations Class

The results in Figure 2 depict the throughput (number of requests per second) for each template engine, with concurrent users ranging from 1 to 128, from left to right. The benchmarks include HtmlFlow using suspendable web templates (HtmlFlow-Susp, equivalent to the approach shown in Listing 5), JStachio using virtual threads with the Iterable interface (JStachio-Virtual), and Thymeleaf using the reactive View Resolver driver (Thymeleaf-Rx). Blocking and Virtual represent the average throughput of the blocking approaches (i.e., KotlinX, Rocker, JStachio, Pebble, Freemarker, Trimou, HtmlFlow, and Thymeleaf) when run in the context of a separate coroutine dispatcher or virtual threads, respectively.
Figure 2. Throughput in requests per second (req/sec) for Spring WebFlux using the Presentation class, showing scalability results as the number of concurrent users increases (1, 2, 4, 8, 16, 32, 64, and 128) across different rendering approaches.
We show the HtmlFlow-Susp, JStachio-Virtual, and Thymeleaf-Rx engines separately to observe the performance of the non-blocking engines when using the Suspending, Virtual Thread, and Reactive approaches. The Blocking and Virtual are aggregated due to the similar performance of different engines when using those approaches.
The results in Figure 2 show that when using blocking template engines with a separate coroutine dispatcher, the engines are unable to scale effectively beyond four concurrent users. In contrast, HtmlFlow-Susp scales up to 128 concurrent users, achieving 4487 requests per second. When blocking approaches are executed in the context of virtual threads—thus enabling non-blocking I/O—the engines scale up to 64 concurrent users, with JStachio using virtual threads reaching 3514 requests per second. The Thymeleaf implementation using the reactive View Resolver driver scales up to 32 concurrent users, achieving a lower maximum throughput of 2559 requests per second.
It is important to note that the differences in scalability and throughput between the HtmlFlow-Susp, JStachio-Virtual, and Thymeleaf-Rx approaches may be influenced by the specific template engines used, rather than the approach itself. When comparing the Reactive, Suspendable, and Virtual approaches specifically with HtmlFlow, we found that all three achieve similar performance: HtmlFlow using a blocking approach with virtual threads reaches 4691 requests per second, while HtmlFlow using a reactive approach achieves 4792 requests per second.
The results for the Spring MVC implementation, shown in Figure 3, compare two synchronous approaches: Blocking, which uses platform threads with StreamingResponseBody, and Virtual, which uses virtual threads. Since Spring MVC follows a thread-per-request architecture, the asynchronous approaches—Reactive and Suspendable—described in Section 4 are not applicable. Both the Blocking and Virtual strategies scale effectively up to 32 concurrent users, with the virtual threads approach achieving a slightly higher maximum throughput of 2797 requests per second, compared to 2498 requests per second for the blocking approach. These results indicate that while Spring MVC can handle a moderate level of concurrency, it does not reach the scalability of the reactive or suspendable approaches available in Spring WebFlux. Furthermore, Spring MVC does not enable Progressive Server-Side Rendering (PSSR) for the tested templates, as previously discussed in Section 4.
Figure 3. Throughput in requests per second (req/sec) for Spring MVC using the Presentation class, showing scalability results as the number of concurrent users increases (1, 2, 4, 8, 16, 32, 64, and 128) across different rendering approaches.
The results for the Quarkus implementation, shown in Figure 4, indicate that Quarkus handles synchronous approaches more efficiently than Spring WebFlux. The blocking engines scale up to 64 concurrent users, achieving up to 3744 requests per second. When using virtual threads, the throughput increases even further, reaching 4856 requests per second, allowing scalability up to 128 users. This demonstrates that Quarkus’s implementation of virtual threads is effective for enabling PSSR, and comparable to the Suspendable and Reactive approaches used in Spring WebFlux in terms of scalability and throughput.
Figure 4. Throughput in requests per second (req/sec) for Quarkus using the Presentation class, showing scalability results as the number of concurrent users increases (1, 2, 4, 8, 16, 32, 64, and 128) across different rendering approaches.
Additionally, HtmlFlow-Rx, a reactive implementation of the HtmlFlow template engine (equivalent to the approach shown in Listing 4) that utilizes Quarkus’s reactive programming model, achieved a lower throughput than the the Blocking and Virtual approaches—3088 requests per second. This demonstrates that Quarkus’s reactive programming model is effective for enabling PSSR, although it does not achieve the same level of performance or scalability as the same approach in Spring WebFlux, stagnating after 32 concurrent users.

5.3. Scalability Results for the Stocks Class

The results in Figure 5 use the same template engines and approaches as the previous benchmark, but with a more complex data model: the Stock class, which includes 20 instances and approximately two times as many data bindings. With this data model, the scalability of the engines remains largely unchanged; however, throughput is reduced across all engines. Compared to the Presentation benchmark, JStachio using virtual threads experienced a more pronounced decrease in performance relative to the Reactive and Suspending approaches, with JStachio using virtual threads now achieving 1509 requests per second, compared to 1750 requests per second achieved by the Thymeleaf implementation using the reactive View Resolver driver.
Figure 5. Throughput in requests per second (req/sec) for Spring WebFlux using the Stock class, showing scalability results as the number of concurrent users increases (1, 2, 4, 8, 16, 32, 64, and 128) across different rendering approaches.
It is again important to note that the differences in scalability and throughput between the HtmlFlow-Susp, Jstachio-Virtual, and Thymeleaf-Rx approaches may be influenced by the specific template engines used, rather than the approach itself. When comparing the Reactive, Suspendable, and Virtual approaches specifically with HtmlFlow, we found that all three achieve similar performance: HtmlFlow using a blocking approach with virtual threads reaches 3090 requests per second, while HtmlFlow using a reactive approach achieves 3026 requests per second. This indicates that the more pronounced decrease in performance for JStachio using virtual threads is likely due to the specific implementation of the template engine, rather than the use of virtual threads itself.
The overall throughput reduction across all engines is expected, as the Stock class contains more data properties than the Presentation class, adding overhead related to the data binding process of each template engine.
The results shown in Figure 6 indicate that the Spring MVC implementation using the blocking approach with StreamingResponseBody achieves a throughput of up to 1916 requests per second, with no significant improvement observed when using virtual threads. Both approaches scale effectively up to 64 concurrent users. Although these approaches achieve higher throughput in Spring MVC than in Spring WebFlux, their overall performance remains lower than that of the reactive and suspendable approaches.
Figure 6. Throughput in requests per second (req/sec) for Spring MVC using the Stock class, showing scalability results as the number of concurrent users increases (1, 2, 4, 8, 16, 32, 64, and 128) across different rendering approaches.
The results depicted in Figure 7 show that the Quarkus synchronous approaches scale effectively up to 128 concurrent users, achieving performance comparable to the Spring WebFlux implementation. The blocking approach reaches a throughput of 3019 requests per second, while the virtual threads approach achieves a throughput of 3357 requests per second. In addition to the synchronous engines, the HtmlFlow-Rx approach also achieves a throughput of 1760 requests per second, indicating that this approach achieves lower performance in Quarkus than in Spring WebFlux, where it reached 3026 requests per second, as previously mentioned.
Figure 7. Throughput in requests per second (req/sec) for Quarkus using the Stock class, showing scalability results as the number of concurrent users increases (1, 2, 4, 8, 16, 32, 64, and 128) across different rendering approaches.
The results of the benchmarks show that non-blocking engines—whether using reactive programming, Kotlin coroutines, or Java virtual threads—are able to scale effectively, supporting between 32 and 128 concurrent users depending on the approach and framework. Out of all the tested frameworks, Spring WebFlux showed itself the most effective at enabling PSSR, mostly due to its native support for publish–subscribe interfaces, allowing for content to be streamed as data becomes available, instead of when the response buffer is flushed. Quarkus also enabled PSSR effectively, but it required additional configuration of the OutputBuffer size to achieve the same results as Spring WebFlux. The Spring MVC implementation, on the other hand, did not enable PSSR for the tested templates.
However, it is important to acknowledge the limitations of our chosen data models for generalizability. The tested templates used relatively simple data structures with limited nesting and straightforward property bindings. Real-world applications often involve deeply nested data structures, complex conditional logic, and iterative rendering over thousands of items. Under such conditions, we anticipate significantly different performance characteristics: increased memory consumption due to object traversal overhead, higher CPU utilization for complex evaluations, and altered scalability patterns where non-blocking approaches may become more advantageous. The performance degradation observed with our Stock class benchmark—which merely doubled the number of properties—suggests that higher template complexity would result in substantially higher performance degradation. Future work should investigate these scenarios with more realistic data models to better understand performance boundaries and optimization strategies for complex PSSR implementations.

5.4. Memory Consumption and Resource Utilization Analysis

This section evaluates the memory and CPU resource usage characteristics of structured concurrency approaches, specifically comparing virtual threads and suspendable coroutines.
Resource utilization data was collected using VisualVM during 30 s benchmark runs, capturing detailed CPU and memory usage patterns under sustained load conditions. Each profiling session monitored system performance during the 1 to 128 concurrent user load test, providing comprehensive insights into runtime behavior.

Virtual Threads vs. Structured Concurrency

Figure 8 and Figure 9 show CPU utilization profiles for the HtmlFlow-Virtual and HtmlFlow-Susp implementations, respectively. The subsequent gradual increase in CPU utilization reflects the ramp-up of concurrent users during the benchmark. HtmlFlow-Virtual exhibits a distinctive bell-shaped curve, rapidly climbing from near-zero to peak CPU usage of around 50% during the sustained load phase, maintaining moderate utilization during the test period, and then dropping sharply back to baseline. HtmlFlow-Susp displays a similar pattern but with a slightly lower peak CPU utilization of approximately 42%, showing marginally lower intensive CPU usage during the sustained load phase. Both profiles demonstrate comparable resource consumption patterns, with the virtual thread implementation showing only slightly higher CPU demands. The GC activity indicators in both profiles remain relatively low throughout the test duration, confirming that garbage collection overhead does not significantly impact CPU utilization in either approach.
Figure 8. CPU utilization profiling with HtmlFlow-Susp in Spring WebFlux during 30 s load test.
Figure 9. CPU utilization profiling with HtmlFlow-Virtual in Spring WebFlux during 30 s load test.
Figure 10 and Figure 11 show similar memory usage for both approaches, with peaks around 300 MB during load. The initial spike to 450 MB in both graphs corresponds to the applications’ bootstrapping phase. Overall, these results indicate that under the tested conditions, virtual threads and suspendable coroutines exhibit comparable memory consumption.
Figure 10. Memory utilization and garbage collection behavior with HtmlFlow-Susp in Spring WebFlux during 30 s load test.
Figure 11. Memory utilization and garbage collection behavior with HtmlFlow-Virtual in Spring WebFlux during 30 s load test.
Virtual threads allocate a full call stack per thread, which is unmounted and stored on the heap when suspended (e.g., during blocking I/O). In contrast, Kotlin coroutines compile into state machines that capture only the minimal execution state needed to resume computation. This typically allows coroutines to scale more efficiently in terms of memory usage, especially under high concurrency.

6. Discussion

Beronić et al. [46] compared different structured concurrency constructs in Java and Kotlin in the context of a multi-threaded HTTP server. Their benchmark scenario differs from ours: their server awaited incoming requests and, upon receiving one, executed a task involving object creation, writing the object’s information to a file, and returning the data as a response. Consistent with our findings, they concluded that both Kotlin’s coroutines and Java’s virtual threads offer significant performance improvements over traditional JVM-based threads in concurrent applications.
Our results diverge from theirs regarding memory usage. They reported lower heap usage for Java virtual threads (16–64 MB) compared to Kotlin coroutines (52–99 MB), which contrasts with our measurements. This discrepancy may be attributed to the different workloads: while their benchmark includes a single I/O read–write operation per request, our experiments focus exclusively on read-only I/O tasks.
The work of Navarro et al. [47] focused on the performance of Quarkus, which relies on Eclipse Vert.x [48], itself built on top of Netty [49]. This represents a similar environment to the one evaluated in our experiments with Quarkus. Their study also emphasized template rendering, using a blocking HTML template engine (Qute). The benchmark they employed is the Fortunes test (https://www.techempower.com/benchmarks, accessed on 29 May 2025), which involves rendering a simple HTML table with only two data bindings per row.
Their results show that Quarkus with virtual threads outperformed the traditional thread-per-request model under increased concurrency. Moreover, compared to the reactive model, virtual threads demonstrated competitive performance, particularly when executing blocking operations within template engines. In contrast, our experiments revealed that the reactive approach in Spring WebFlux performs better than virtual threads in the same experiment conducted with Quarkus, suggesting that WebFlux provides a more efficient infrastructure for managing reactive processing flows.
The work of Šimatović et al. [50] presents an evaluation of Java virtual threads under different garbage collectors. They explored three types of workloads, only one of which was based on a web application scenario. This scenario replicated large-scale web scraping by executing 1000 parallel tasks, each consisting of an I/O-bound operation followed by CPU-bound string processing.
In this workload, garbage collection activity remained low across all collectors, indicating reduced memory pressure. These findings suggest that virtual threads improve memory allocation efficiency and reduce the frequency of garbage collection cycles—particularly when used with concurrent garbage collectors.
Our observations suggest that the choice between virtual threads and suspendable coroutines should be guided by both runtime behavior and development constraints. Virtual threads offer the compelling advantage of preserving synchronous programming semantics while achieving competitive performance across moderate to high-load scenarios—a significant benefit for teams migrating existing codebases or working with legacy template engines. Suspendable coroutines, while requiring developers to adopt asynchronous programming paradigms, may provide greater efficiency in scenarios involving high concurrency, deeper call stacks, or highly dynamic workloads.
The framework-specific variations observed in our results—particularly the performance differences between Spring WebFlux and Quarkus implementations—suggest that the underlying reactive runtime architecture significantly influences the effectiveness of each approach. This finding underscores the importance of considering not only the concurrency model but also the specific framework ecosystem when making architectural decisions for PSSR implementations.

7. Conclusions

In recent decades, non-blocking I/O has become the standard approach for building highly responsive and scalable web servers. However, traditional synchronous programming models are not compatible with non-blocking APIs, which typically rely on callback-based conventions such as continuation-passing style (CPS) [20] or promises [21]. These approaches not only hinder sequential readability but also increase code verbosity, making them more error-prone. Alternatives like the async/await idiom [22] or suspend functions [23] simplify asynchronous programming by mimicking a sequential style without blocking threads. Recent proposals [2,4] have applied contemporary asynchronous idioms to SSR web templates, demonstrating how they can overcome the scalability bottlenecks present in traditional web template engines.
As an alternative, Java virtual threads can be applied to any blocking I/O call, leveraging the Java runtime to transparently intercept and convert it into non-blocking I/O, without requiring any changes to the calling code or exposing its internal complexities. In this work, we explored how this technique can be applied to traditional web template engines and whether it can achieve performance competitive with reactive approaches provided by frameworks such as Thymeleaf [51] and HtmlFlow [5]. Our benchmarks across Spring WebFlux, Spring MVC, and Quarkus show that synchronous non-blocking execution using virtual threads consistently delivers performance comparable to asynchronous non-blocking approaches under high concurrency. These findings highlight virtual threads as a promising alternative to complex asynchronous programming models, offering a simpler development experience without compromising scalability or responsiveness.

Author Contributions

Conceptualization, B.P. and F.M.C.; methodology, B.P. and F.M.C.; software, B.P. and F.M.C.; validation, B.P. and F.M.C.; formal analysis, B.P. and F.M.C.; investigation, B.P. and F.M.C.; resources, B.P. and F.M.C.; data curation, B.P. and F.M.C.; writing—original draft preparation, B.P. and F.M.C.; writing—review and editing, B.P. and F.M.C.; supervision, F.M.C.; project administration, F.M.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study did not involve human subjects or personal data; therefore, Institutional Review Board approval was not required.

Data Availability Statement

The data presented in this study are available in Github at https://github.com/xmlet/comparing-non-blocking-progressive-ssr, accessed on 29 May 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Edgar, M. First Contentful Paint. In Speed Metrics Guide: Choosing the Right Metrics to Use When Evaluating Websites; Apress: Berkeley, CA, USA, 2024; pp. 73–91. [Google Scholar] [CrossRef]
  2. Carvalho, F.M. Progressive Server-Side Rendering with Suspendable Web Templates. In Web Information Systems Engineering—WISE 2024; Barhamgi, M., Wang, H., Wang, X., Eds.; Springer: Singapore, 2025; pp. 458–473. [Google Scholar]
  3. Elmeleegy, K.; Chanda, A.; Cox, A.L.; Zwaenepoel, W. Lazy Asynchronous I/O for Event-Driven Servers. In Proceedings of the Annual Conference on USENIX Annual Technical Conference, ATEC ’04, Boston, MA, USA, 27 June –2 July 2004; p. 21. [Google Scholar]
  4. Carvalho, F.M.; Fialho, P. Enhancing SSR in Low-Thread Web Servers: A Comprehensive Approach for Progressive Server-Side Rendering with Any Asynchronous API and Multiple Data Models. In Proceedings of the 19th International Conference on Web Information Systems and Technologies, WEBIST ’23, Rome, Italy, 15–17 November 2023. [Google Scholar]
  5. Carvalho, F.M. HtmlFlow Java DSL toWrite Typesafe HTML. Technical Report. 2017. Available online: https://htmlflow.org/ (accessed on 29 May 2025).
  6. Fernández, D. Thymeleaf. Technical Report. 2011. Available online: https://www.thymeleaf.org/ (accessed on 29 May 2025).
  7. Fowler, M. Domain Specific Languages; Addison-Wesley Professional: Boston, MA, USA, 2010. [Google Scholar]
  8. Fowler, M. Patterns of Enterprise Application Architecture; Addison-Wesley Longman Publishing Co., Inc.: Boston, MA, USA, 2002. [Google Scholar]
  9. Alur, D.; Malks, D.; Crupi, J. Core J2EE Patterns: Best Practices and Design Strategies; Prentice Hall PTR: Upper Saddle River, NJ, USA, 2001. [Google Scholar]
  10. Parr, T.J. Enforcing Strict Model-View Separation in Template Engines. In Proceedings of the 13th International Conference on World Wide Web, WWW ’04, New York, NY, USA, 17–20 May 2004; pp. 224–233. [Google Scholar] [CrossRef]
  11. Krasner, G.E.; Pope, S. A Description of the Model-View-Controller User Interface Paradigm in the Smalltalk80 System. J. Object-Oriented Program. 1988, 1, 26–49. [Google Scholar]
  12. Netflix; Pivotal; Red Hat; Oracle; Twitter; Lightbend. Reactive Streams Specification. Technical Report. 2015. Available online: https://www.reactive-streams.org/ (accessed on 29 May 2025).
  13. Landin, P.J. The next 700 programming languages. Commun. ACM 1966, 9, 157–166. [Google Scholar] [CrossRef]
  14. Evans, E.; Fowler, M. Domain-Driven Design: Tackling Complexity in the Heart of Software; Addison-Wesley: Boston, MA, USA, 2004. [Google Scholar]
  15. Thompson, K. Programming Techniques: Regular Expression Search Algorithm. Commun. ACM 1968, 11, 419–422. [Google Scholar] [CrossRef]
  16. Resig, J. Pro JavaScript Techniques; Apress: New York, NY, USA, 2007. [Google Scholar]
  17. Hors, A.L.; Hégaret, P.L.; Wood, L.; Nicol, G.; Robie, J.; Champion, M.; Arbortext; Byrne, S. Document Object Model (DOM) Level 3 Core Specification. Technical Report. 2004. Available online: https://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/ (accessed on 29 May 2025).
  18. Carvalho, F.M.; Duarte, L.; Gouesse, J. Text Web Templates Considered Harmful. In Lecture Notes in Business Information Processing; Springer: Cham, Switzerland, 2020; pp. 69–95. [Google Scholar]
  19. Landin, P.J. Correspondence Between ALGOL 60 and Church’s Lambda-notation: Part I. Commun. ACM 1965, 8, 89–101. [Google Scholar] [CrossRef]
  20. Sussman, G.; Steele, G. Scheme: An Interpreter for Extended Lambda Calculus; AI Memo No. 349; MIT Artificial Intelligence Laboratory: Cambridge, MA, USA, 1975. [Google Scholar]
  21. Friedman; Wise. Aspects of Applicative Programming for Parallel Processing. IEEE Trans. Comput. 1978, C-27, 289–296. [Google Scholar] [CrossRef]
  22. Syme, D.; Petricek, T.; Lomov, D. The F# Asynchronous Programming Model. In Practical Aspects of Declarative Languages; Rocha, R., Launchbury, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; pp. 175–189. [Google Scholar]
  23. Elizarov, R.; Belyaev, M.; Akhin, M.; Usmanov, I. Kotlin coroutines: Design and implementation. In Proceedings of the 2021 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Chicago, IL, USA, 20–22 October 2021; pp. 68–84. [Google Scholar]
  24. Meijer, E. Democratizing the Cloud with the .NET Reactive Framework Rx. In Proceedings of the Internaional Softare Development Conference, Vancouver, BC, Canada, 16–24 May 2009; Available online: https://qconsf.com/sf2009/sf2009/speaker/Erik+Meijer.html (accessed on 29 May 2025).
  25. RxJava Contributors. RX Java. Technical Report. 2025. Available online: https://github.com/ReactiveX/RxJava (accessed on 29 May 2025).
  26. VMWare; Contributors. Project Reactor. Technical Report. 2025. Available online: https://projectreactor.io/ (accessed on 29 May 2025).
  27. Davis, A.L. Akka HTTP and Streams. In Reactive Streams in Java: Concurrency with RxJava, Reactor, and Akka Streams; Apress: Berkeley, CA, USA, 2019; pp. 105–128. [Google Scholar]
  28. Ponge, J.; Navarro, A.; Escoffier, C.; Le Mouël, F. Analysing the performance and costs of reactive programming libraries in Java. In Proceedings of the 8th ACM SIGPLAN International Workshop on Reactive and Event-Based Languages and Systems, REBLS 2021, New York, NY, USA, 18 October 2021; pp. 51–60. [Google Scholar] [CrossRef]
  29. Breslav, A. Kotlin Language Documentation. Technical Report. 2016. Available online: https://kotlinlang.org/docs/kotlin-docs.pdf (accessed on 29 May 2025).
  30. Vogel, L.; Springer, T. User Acceptance of Modified Web Page Loading Based on Progressive Streaming. In Proceedings of the International Conference on Web Engineering, Bari, Italy, 5–8 July 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 391–405. [Google Scholar]
  31. Atwood, J. The Lost Art of Progressive HTML Rendering. Technical Report. 2005. Available online: https://blog.codinghorror.com/the-lost-art-of-progressive-html-rendering/ (accessed on 29 May 2025).
  32. Farago, J.; Williams, H.; Walsh, J.; Whyte, N.; Goel, K.; Fung, P. Object Search UI and Dragging Object Results. US Patent Applications 11/353,787, 14 February 2007. [Google Scholar]
  33. Schiller, S. Progressive Loading. US Patent Applications 11/364,992, 26 February 2007. [Google Scholar]
  34. Von Behren, R.; Condit, J.; Brewer, E. Why Events Are a Bad Idea (for {High-Concurrency} Servers). In Proceedings of the 9th Workshop on Hot Topics in Operating Systems (HotOS IX), Lihue, HI, USA, 18–21 May 2003. [Google Scholar]
  35. Kambona, K.; Boix, E.G.; De Meuter, W. An Evaluation of Reactive Programming and Promises for Structuring Collaborative Web Applications. In Proceedings of the 7th Workshop on Dynamic Languages and Applications, DYLA ’13, New York, NY, USA, 1–5 July 2013. [Google Scholar] [CrossRef]
  36. Kant, K.; Mohapatra, P. Scalable Internet servers: Issues and challenges. ACM SIGMETRICS Perform. Eval. Rev. 2000, 28, 5–8. [Google Scholar] [CrossRef]
  37. Meijer, E. Your Mouse is a Database. Queue 2012, 10, 20.20–20.33. [Google Scholar] [CrossRef]
  38. Jin, X.; Wah, B.W.; Cheng, X.; Wang, Y. Significance and Challenges of Big Data Research. Big Data Res. 2015, 2, 59–64. [Google Scholar] [CrossRef]
  39. Karsten, M.; Barghi, S. User-Level Threading: Have Your Cake and Eat It Too. Proc. ACM Meas. Anal. Comput. Syst. 2020, 4, 1–30. [Google Scholar] [CrossRef]
  40. Burke, B. RESTful Java with JAX-RS 2.0: Designing and Developing Distributed Web Services; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2013. [Google Scholar]
  41. Haynes, C.T.; Friedman, D.P.; Wand, M. Continuations and Coroutines. In Proceedings of the 1984 ACM Symposium on LISP and Functional Programming, LFP ’84, Austin, TX, USA, 5–8 August 1984; pp. 293–298. [Google Scholar] [CrossRef]
  42. Veen, R.; Vlijmincx, D. Scoped Values. In Virtual Threads, Structured Concurrency, and Scoped Values: Explore Java’s New Threading Model; Apress: Berkeley, CA, USA, 2024. [Google Scholar] [CrossRef]
  43. Model-View-Controller Pattern. In Learn Objective-C for Java Developers; Apress: Berkeley, CA, USA, 2009; pp. 353–402. [CrossRef]
  44. Bösecke, M. JMH Benchmark of the Most Popular Java Template Engines. Technical Report. 2015. Available online: https://github.com/mbosecke/template-benchmark (accessed on 29 May 2025).
  45. Reijn, J. Comparing Template Engines for Spring MVC. Technical Report. 2015. Available online: https://github.com/jreijn/spring-comparing-template-engines (accessed on 29 May 2025).
  46. Beronić, D.; Modrić, L.; Mihaljević, B.; Radovan, A. Comparison of Structured Concurrency Constructs in Java and Kotlin—Virtual Threads and Coroutines. In Proceedings of the 2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO), Opatija, Croatia, 23–27 May 2022; pp. 1466–1471. [Google Scholar] [CrossRef]
  47. Navarro, A.; Ponge, J.; Le Mouël, F.; Escoffier, C. Considerations for integrating virtual threads in a Java framework: A Quarkus example in a resource-constrained environment. In Proceedings of the 17th ACM International Conference on Distributed and Event-Based Systems, Neuchatel, Switzerland, 27–30 June 2023; pp. 103–114. [Google Scholar]
  48. Fox, T. Eclipse Vert.xTM Reactive Applications on the JVM. Technical Report. 2014. Available online: https://vertx.io/ (accessed on 29 May 2025).
  49. Maurer, N.; Wolfthal, M. Netty in Action; Manning Publications: Shelter Island, NY, USA, 2015. [Google Scholar]
  50. Šimatović, M.; Markulin, H.; Beronić, D.; Mihaljević, B. Evaluating Memory Management and Garbage Collection Algorithms with Virtual Threads in High-Concurrency Java Applications. In Proceedings of the 48th ICT and Electronics Convention MIPRO 2025. Rijeka: Croatian Society for Information, Communication and Electronic Technology–MIPRO, Opatija, Croatia, 2–6 June 2025. [Google Scholar]
  51. Deinum, M.; Cosmina, I. Building Reactive Applications with SpringWebFlux. In Pro Spring MVC withWebFlux: Web Development in Spring Framework 5 and Spring Boot 2; Apress: Berkeley, CA, USA, 2021; pp. 369–420. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.