A Functional Yield-Based Traversal Pattern for Concise, Composable, and Efficient Stream Pipelines

Carvalho, Fernando Miguel

doi:10.3390/software5010007

Open AccessArticle

A Functional Yield-Based Traversal Pattern for Concise, Composable, and Efficient Stream Pipelines

by

Fernando Miguel Carvalho

Instituto Superior de Engenharia de Lisboa, Instituto Politécnico de Lisboa, Rua Conselheiro Emídio Navarro 1, 1959-007 Lisboa, Portugal

Software 2026, 5(1), 7; https://doi.org/10.3390/software5010007

Submission received: 5 December 2025 / Revised: 20 January 2026 / Accepted: 2 February 2026 / Published: 10 February 2026

Download

Browse Figures

Versions Notes

Abstract

The stream pipeline idiom provides a fluent and composable way to express computations over collections. It gained widespread popularity after its introduction in .NET in 2005, later influencing many platforms, including Java in 2014 with the introduction of Java Streams, and continues to be adopted in contemporary languages such as Kotlin. However, the set of operations available in standard libraries is limited, and developers often need to introduce operations that are not provided out of the box. Two options typically arise: implementing custom operations using the standard API or adopting a third-party collections library that offers a richer suite of operations. In this article, we show that both approaches may incur performance overhead, and that the former can also suffer from verbosity and reduced readability. We propose an alternative approach that remains faithful to the stream-pipeline pattern: developers implement the unit operations of the pipeline from scratch using a functional yield-based traversal pattern. We demonstrate that this approach requires low programming effort, eliminates the performance overheads of existing alternatives, and preserves the key qualities of a stream pipeline. Our experimental results show up to a 3× speedup over the use of native yield in custom extensions.

Keywords:

generators; streams; lazy sequences; iterators; extensions

1. Introduction

Collections often need to be traversed, transformed, and queried to extract the desired data. Chaining operations over streams is a well-known pattern described as a collection or stream pipeline [1,2]. A pipeline denotes a sequence of operations (e.g., Listing 1), performed one after another, where the output of one operation (upstream) becomes the input for the next (downstream).

To reduce the amount of required memory, the operations that form a stream pipeline may follow a lazy processing approach. This is a well-known technique, first introduced with lazy lists in Lisp in 1976 [3], and widely adopted across programming environments. This methodology avoids processing items in advance and requires less memory than an eager approach, because it does not store items from intermediate stages in auxiliary collections. Eliminating the need for intermediate lists is also known as deforestation [4]. The appealing idiom provided by stream pipelines makes them an attractive target for research aimed at optimizing their inherent processing overhead [5,6,7,8,9].

Yet standard stream libraries provide a limited set of operations, and programmers may turn to third-party libraries that offer a richer set. For example, in the Java ecosystem, we may find a diversity of collection libraries providing this programming idiom, such as Vavr [10], jOO

λ

[11], Eclipse Collections [12], Guava [13], Protonpack [14], StreamEx [15], and many others. Another option is to extend existing standard libraries with new custom operations that comply with the provided stream API. However, extending streams with new custom operations lazily is not as straightforward as implementing them eagerly. This is especially true in object-oriented languages and when following the iterator design pattern [16], which substantially increases the complexity and verbosity of these implementations, as observed by Henry Baker [17]. The use of the generator operator (i.e., yield) [18] to implement stream operations addresses the aforementioned problem and has been widely adopted by mainstream programming languages, including both Scala [19] and Kotlin [20], which are compatible with the JVM runtime environment [21]. The yield operator allows programmers to develop user-defined operations on streams concisely while preserving their laziness property.

Despite the resulting conciseness of extensions that implement new stream operations with yield, we observed in benchmarks—including the same-fringe algorithm [22], also analyzed by Henry Baker [17]—that a yield-based implementation may be between two and five times slower than an ad hoc implementation in several tree-traversal scenarios. Moreover, when testing different third-party collection libraries for the JVM, we found that none presented consistent performance across all scenarios, and all of them degraded performance in at least some benchmarks when compared to an ad hoc implementation. On the other hand, an ad hoc implementation using for loops (e.g., Listing 2) has drawbacks in readability and maintainability, making it difficult to decompose and reuse.

Our goal is to achieve a solution that is as close as possible to the performance of an ad hoc implementation, while still preserving the three main attributes of a collection pipeline: composability, laziness, and readability. Thus, instead of using third-party libraries or creating extensions with a yield generator, we propose that programmers develop the building-block operations of a collection pipeline from scratch and tailor them to their needs. To that end, we propose using a functional, yield-based traversal design pattern and show that we can implement each of the core operations map, filter, limit, and reduce in two to five lines of source code. Moreover, we can build more complex operations, such as zip, which is one of the building blocks for collection pipelines in certain tree-traversal scenarios, with the same simplicity as the yield primitive but without incurring the costs of post-processing code or the execution overhead of the resulting state machine. Finally, we show, across various tree-traversal benchmarks, that our proposal achieves competitive performance compared with an ad hoc implementation. Our benchmark implementation follows state-of-the-art practices using JMH [23] and is available on GitHub at https://github.com/tinyield/treebench-jvm (accessed on 1 February 2026).

We build on earlier work, and we do not claim novelty for those features. However, to the best of our knowledge, neither the specific design pattern we introduce nor the experimental observations we report have been described in the literature. The remainder of this paper is organized as follows. Section 2 describes the main properties of the stream pipeline pattern. Section 3 establishes the yield programming model, including terminology and formal yield properties.

Section 4 describes the generalized design of the functional yield-based traversal pattern. Section 5 presents related work along with existing alternative libraries in the Java ecosystem. Section 6 explains the tests devised to analyze the sequence alternatives and discusses the benchmark results. Section 7 compares our results with those from other studies. The conclusions of this work are presented in Section 8.

2. Stream Pipeline Pattern

Stream pipelines let programmers compose transformations over data, with the result of each computation serving as the input to the next transformation in the pipeline. We may use the words stream and sequence interchangeably, with the same meaning. Streams are distinguished by three key attributes: composability, which enables the construction of complex processing flows from simple operations; readability, in which each operation clearly expresses its purpose; and laziness, which ensures computations are performed only when needed. For example, given a variable pastWeather that refers to a sequence of Weather objects (i.e., Stream<Weather> in Java), each representing daily meteorological information, we can filter for sunny days, create a stream of temperatures, and finally select the first five values. This is achieved through a chain of filter, map, and limit operations, producing the pipeline illustrated in Listing 1 in Java. In this case, some operations (namely, filter and map) require functions that describe how each Weather object should be processed (e.g., Weather::isSunny and Weather::celsius).

Listing 1: Java example of a stream pipeline that extracts the first five temperatures from sunny days.

Stream<Integer> top5temps = pastWeather
  .filter(Weather::isSunny)
  .map(Weather::getCelsius)
  .limit(5);

On the other hand, the example in Listing 2 demonstrates an alternative implementation that avoids using auxiliary functions like filter, map, or limit. This imperative approach manually processes items from the data source, checking their properties (i.e., isSunny and getCelsius) to validate and transform items into a new sequence (i.e., top5temps).

Listing 2: Java imperative example that extracts the first five temperatures from sunny days.

var top5temps = new ArrayList<Integer>();
for (var w : pastWeather){
   if (w.isSunny()) {             // ~ filter
      top5temps.add(w.getCelsius());     // ~ map
      if (top5temps.size() >= 5) {    // ~ limit
         break;
      }
   }
}

Ignoring performance or efficiency issues for now and focusing solely on readability, in Listing 2, we need to mentally parse the control flow to understand what is happening. In contrast, Listing 1 is not only less verbose but also clearly conveys the purpose of each line, with the name of each operation corresponding directly to the action performed on the data. Beyond the idiomatic differences, there is another significant difference regarding processing efficiency, where the resulting Stream<Integer> top5Temps of Listing 1 is lazy, which means that the items from pastWeather are not obtained until a terminal operation (such as forEach, reduce, first, etc.) processes the top5Temps. On the other hand, the implementation of Listing 2 uses an eager approach, consuming elements from pastWeather up front and immediately populating an in-memory ArrayList<Weather>.

Stream APIs may vary in different characteristics depending on the technological environment, such as:

Operation names.
Composability: method chaining versus nested functions.
Eager versus lazy evaluation.
Extensibility.
Access approach: pull versus push.

In the next subsections, we describe each of these characteristics.

2.1. Operation Names

Although operation names such as filter and map are fairly consistent across sequence operations, different environments may use different terminology. For example, in C#, operations similar to filter and map are called Where and Select, respectively, while flatMap is known as SelectMany in C# and expand in Dart. The equivalent example to Listing 1 in C# is shown in Listing 3. So, maybe in other technological environments, you will need to look up the correct name for the operation you are seeking to use.

Listing 3: C# example of a stream pipeline that extracts the first five temperatures from sunny days.

var top5temps = pastWeather
  .Where(weather => weather.IsSunny)
  .Select(weather => weather.Celsius)
  .Take(5)

2.2. Composability: Method Chaining Versus Nested Functions

The technique of composing a pipeline, as shown in Listings 1 and 3, is known as method chaining. In this approach, the receiver object (the object on which the method is called) is implicitly passed as an argument to each method call, allowing subsequent methods to be invoked on the result of the previous method. While this idiom may seem the most natural way to chain operations into a pipeline, it may not be intuitive to all developers, particularly those unfamiliar with object-oriented programming.

For example, Scheme or Clojure developers might prefer the nested function idiom, in which functions are combined by making function calls the arguments of higher-level function calls. In this approach, the sequence of nested functions is evaluated from the innermost function outward, meaning that arguments are evaluated before the function is called. Consequently, operations in a pipeline need to be composed in the reverse order of their execution. However, this is typically not an issue for functional programmers, who are accustomed to writing such pipelines in functional programming languages like Clojure, as demonstrated in Listing 4.

Listing 4: Clojure example of a stream pipeline that extracts the first five temperatures from sunny days.

(def top5temps
  (take 5
    (map :celsius
      (filter :isSunny pastWeather))))

2.3. Eager Versus Lazy Evaluation

Another design difference between the technologies and idioms presented in the previous listings relates to evaluation time. The operations used in the approach of Listing 3 (e.g., Where, Select, and Take) and in Listings 1 and 4 (e.g., map, filter, and limit/take) are characterized by lazy evaluation. This means they do not process the source elements (e.g., pastWeather) immediately when called. Instead, calling these methods merely adds another step to the stream pipeline. It is important to note that streams are immutable, so each query method returns a new stream that results from composing the previous stream with an additional operation.

This contrasts with collections, which are in-memory data structures that store all values they contain. For example, in the imperative approach shown in Listing 2, we eagerly instantiate the resulting list top5temps, which holds the outcome of the eager pipeline processing. Some APIs, such as the JavaScript Array API and the Kotlin Collection API, evaluate operations eagerly rather than lazily. As a result, the equivalent pipeline for Listing 1, when written in JavaScript or Kotlin for a top5temps array or collection, would be processed immediately rather than deferring computation until the results are consumed. It is worth noting that Kotlin also provides the Sequence<T> API, which offers a lazy alternative to the eager Collection operations.

Implementing new custom operations for a stream API requires adherence to the interface of its internal traversal mechanism. Each stream technology has an interface that specifies how elements are traversed and accessed. Despite slight differences, many iterator protocols, such as Iterator in JVM or Enumerator in C#, provide at least one method to advance to the next element and one property to access the current element.

In Java’s standard library Iterator:

The next() method serves both roles of advancing and accessing the next element.
The hasNext() method indicates whether there are more items to iterate over.

We will illustrate how to implement a new zip method that combines elements from two sequences. The zip (also known as convolution) is an operation that takes a tuple of sequences and transforms them into a sequence of tuples. For Java Stream objects, zip can be implemented by providing a custom implementation of the Iterable and Iterator interfaces, as shown in Listing 5. Later in Section 3, we present a more compact implementation using generators.

Listing 5: Implementation of a Java zip method for combining two streams.

The ZipIterable class of Listing 5 provides a simple way to combine two streams by iterating over them together and applying a user-supplied combining function to each pair of elements. Its static zip method constructs a ZipIterable from two input streams and a BiFunction that merges corresponding elements, and then exposes the result as a standard Java Stream via StreamSupport. Internally, the class stores iterators for both input streams along with the zipper function. The iterator it returns advances both underlying iterators simultaneously, reporting that it has more elements only when both inputs do (line 20), and producing each next value by applying the zipper function to the next elements retrieved from each stream (line 23).

2.4. Access Approach: Pull Versus Push

The iteration protocol described in the previous section, and used in the implementation of the zip method in Listing 5, follows a pull-based model in which elements are obtained from the sequence on demand and then processed. Developers can check whether the sequence has more elements by calling the hasNext method, and they can obtain the next element using the next method. Once an element is accessed, it can be processed according to the developer’s code. The Enumerator in C# and the Iterator in Dart or JavaScript both use a similar pull-based approach.

Java streams use a different access approach known as push-based. Instead of pulling items from the sequence (i.e., requesting elements), the developer specifies what to do with the items (i.e., providing instructions). In this push-based model, the method analogous to next(): T, such as tryAdvance(Consumer<T>), accepts a function (i.e., a Consumer) that defines what to do with the next element, rather than returning the element itself. The tryAdvance() method returns a boolean false if there are no remaining elements, and true if there are more elements to process.

The tryAdvance method is part of the Spliterator interface in Java streams. Although Spliterator provides additional functionality for parallel processing, we focus here on sequential processing only. Using this approach, we can implement a custom zip operation for Java streams by explicitly extending AbstractSpliterator, thereby following a push-based processing model, as illustrated in Listing 6. It defines a custom AbstractSpliterator in which the tryAdvance method is implemented to provide elements from both spliterators to a consumer in sequence. For each call to tryAdvance, the method passes the next element from the first spliterator and, if available, the corresponding element from the second spliterator to the provided lambda, applies the zipper function to combine them, and delivers the result to the consumer. This push-based approach stops producing elements as soon as either input stream is exhausted.

Listing 6: Implementation of a Java zip method for combining two streams.

3. Yield Programming Model

The variants of yield operator are beyond the scope of this paper, and we are only establishing a common terminology according to its formal model [18]. Here, we will define the yield operator in Kotlin [20] as the lingua franca to focus on the properties shared across programming languages. We begin by describing the yield programming model in the next subsection. Then, in Section 3.2, we show how it simplifies the same-fringe algorithm. Finally, in Section 3.3, we present the fundamental properties of generators.

3.1. Generator Operator Yield

Simply put, a generator is like a function that generates a sequence of values. However, instead of building a sequence at once (e.g., array or vector), a generator yields values one at a time, i.e., it returns a “new” value every time it is called. In Kotlin, a generator produces an instance of Sequence<T>, a lazily evaluated stream of elements of type T, conceptually similar to a Java Iterable<T>. In contrast, Kotlin’s own Iterable<T> represents an eagerly processed sequence of items.

The generator operator yield is inspired by the coroutine primitive yield. In coroutines, the yield provides a means of suspending a computation, so that execution can be resumed later [24]. In the same way, the term generator refers to a computation that (1) yields values to the caller and (2) is resumed after the yielded value has been consumed by the caller. Like a coroutine, the caller must interact with the generator by reading the yielded values and resuming. This idea was first introduced in the CLU programming language [25] in 1975 and was the key to expression evaluation in the Icon programming language in 1977 [26]. But its widespread popularity may be attributed to its first use in C#2.0 [27] and later in Ruby 1.9 [28]. In CLU and C#, generators are known as iterators, and in Ruby, enumerators. Also, Python, Php, JavaScript, Scala, and Dart provide variants of the yield operator.

Summary.

A generator acts like a subroutine encompassing a special computation that is restricted to communicate with its caller through the yield primitive. During its computation, a generator can yield many values.

This characteristic makes generators useful for simplifying the implementation of lazy iterators. With generators, programming languages provide a way to implement operations concisely, similar to eager approaches, without incurring their drawbacks while maintaining lazy behavior. The compiler then translates that implementation into a lazy form by implementing the required traversal interfaces. In Kotlin, generators and the yield primitive are implemented using function literals with receiver (i.e., lambdas that operate in the context of an implicit receiver). The Kotlin builder sequence accepts such a lambda, giving it implicit access to the receiver that provides the yield operation.

Thus, a generator may be written as follows: sequence {… yield(item) …}. Note that the braces following sequence introduce a lambda expression whose parameter list—and therefore the -> arrow—can be omitted. Both the sequence’s lambda and the yield function are suspending functions. A suspending function differs from a normal (non-suspending) function by potentially containing one or more suspension points. Whenever a suspending function invokes another suspending function, execution may pause at that invocation, forming a suspension point that can later be resumed. Consequently, every call to yield inside a sequence builder may suspend the generator’s execution.

To exemplify the yield semantics in the context of generators, we will start with a generator that produces a sequence of Cullen numbers defined by

C_{n} = n \cdot 2^{n} + 1

and implemented in Kotlin according to Listing 7.

Listing 7: Generator of a sequence of Cullen numbers in Kotlin.

fun cullen() = sequence {
   var i = 1
   while (true) {
       val nr = (1 shl i) * i + 1
       yield(nr) // suspension point
       i++
   }
}

The function cullen() constructs a lazy, potentially unbounded sequence by invoking Kotlin’s sequence builder, which internally creates a generator. Within this generator, the yield() function produces one Cullen number at a time and suspends the computation until the next element is requested by the consumer of the sequence. Each call to yield() emits the current value of

C_{n}

and preserves the generator’s state—including the current value of i—so that execution resumes immediately after the yield() call when the next element is demanded. In Kotlin, yield is a suspending function, and each invocation creates a suspension point.

In the example of Listing 8 we are printing the first 10 values produced by the cullen generator. Like any other high-level programming language, Kotlin provides a for statement for iterating over any traversable instance. This is equivalent to for(of) construct in Javascript or for(:) in Java. Arrays, collections, or any kind of sequence, including those resulting from generators, are traversable, which enables the use of the for over the result of the cullen().

Listing 8: Consuming the cullen generator in Kotlin.

val cullen: Sequence<Int> = cullen()
var limit = 10
for (nr in cullen) {
   if (--limit < 0) break
   println(nr)
}

When a generator yields, it pauses execution and does not need to be resumed by its caller, even if it has pending computations. For instance, cullen subroutine breaks out of the loop, eventually leaving it suspended for good.

Terminology.

We use the term generator to refer to computations that yield values. Only generator functions can use the yield keyword. A free yield results in a compiler error. Finally, the argument to the yield operator becomes an output of the generator. We refer to these outputs as yielded values.

3.2. Same-Fringe Use Case

The same-fringe algorithm [22] decides whether two finite trees have the same enumeration of leaves in left-to-right order, and can be easily solved if we construct at least one of the fringes as an explicit data structure. Yet, there is a useless overhead in case the fringe is very large, and the fringes do not match. Hence, we will have performed a great deal of work for nothing.

By leveraging the stream pipeline pattern, we can simplify the implementation of the same-fringe algorithm by decomposing it into a chain of operations and avoiding unnecessary processing when the fringes do not match. This algorithm can be expressed with a pipeline of leaves-zip-all, as shown in Listing 9.

Listing 9: Chain of operations implementing the same-fringe algorithm.

tree.leaves().zip(other.leaves(), Any::equals).all { item -> item }

The use of zip in Listing 9 produces a sequence of Boolean values indicating whether each pair of corresponding leaves from tree.leaves() and other.leaves() matches. The subsequent all operation then returns true only if every upstream element satisfies the given predicate. Since the upstream sequence already contains Boolean values, the predicate is simply the identity function (i.e., item -> item).

Both the leaves and zip operations can be implemented in a straightforward and fully lazy manner with the support of the Kotlin sequence builder and its yield suspending function. Listing 10 presents a Kotlin yield-based implementation of a leaves generator that produces a lazy sequence of the leaves of a binary tree, where each Node<T> contains three properties: value, left, and right. The leaves method uses recursion to traverse the tree, delegating to left.leaves() and right.leaves() as needed.

Listing 10: yield-based generator of the Node<T> leaves.

fun <T> Node<T>.leaves() = sequence {
   if (left == null && right == null) {
       yield(value)
   } else {
       for (n in left?.leaves() ?: emptySequence()) {
           yield(n)
       }
       for (n in right?.leaves() ?: emptySequence()) {
           yield(n)
       }
   }
}

Notice that without the yield operator, the implementation of the leaves method would easily exceed forty lines of code, as illustrated by the Java solution referenced in the StackOverflow question “In-order iterator for binary tree” (https://stackoverflow.com/a/12851421/1140754, accessed on 1 February 2026). Such a verbose iterator-based idiom is one of the weaknesses of object-oriented languages highlighted in [17]. A similar issue appears in the zip iterator implementation shown in Listing 5, which can be expressed far more succinctly when yield is available. Although Kotlin already provides zip in its standard library, we present a yield-based version in Listing 11 to demonstrate both the conciseness and the clarity of this style of construction.

Listing 11: yield-based implementation of a zip operation in Kotlin.

fun <T, U, R> zip(
  self: Sequence<T>,
  other: Sequence<U>,
  zipper: (T, U) -> R
) = sequence {
   val rightIter = other.iterator()
   for (left in self) {
      if (rightIter.hasNext()) {
         yield(zipper(left, rightIter.next()))
      }
   }
}

3.3. Generator Properties

Based on the terminology introduced in previous sections, we unify the generator properties into a traversal design that reflects the behavior of the yield primitive and introduce types where appropriate.

Summary.

The traversal model is statically typed, stackful, and delimited, and supports first-class generators. Static typing improves program safety, stackfulness allows better composition, and delimited means that the yield primitive can only occur within a scope lexically enclosed by a traversal.

3.3.1. First-Class Generators

Generators represent first-class objects and behave like first-class functions. Generators are invoked, and the caller must interact with them by reading the yielded values and resuming. Generator instances may be passed by value to other generators.

3.3.2. Stackful

Traversals enable the composition of independently written generators. It must be possible for one generator to invoke another generator while preserving the same yielding context. For example, consider a numbers array containing integer values. We may want a generator digits(Array<Int>) that traverses the numbers array and produces a lazy sequence of all their digits. Given the separately implemented digits(Int) generator in Listing 12, which yields the digits of a single integer (nr), the digits(Array<Int>) generator in Listing 13 can conveniently reuse this functionality by passing each integer to digits(Int) and yielding its results.

Listing 12: yield-based implementation of digits(Int) generator.

fun digits(nr: Int) = sequence {
  var count = nr
  while (count > 0) {
     yield(count % 10)
     count = count / 10
  }
}

Listing 13: yield-based implementation of digits(Array<Int>) generator that combines the use of digits(Int).

The for statement of Line 3 of Listing 13 is used to delegate to another generator and iterate over the yielded values. This requirement is equivalent to that stated by a monad combinator, where given a type constructor M that builds up a monadic type

M T

and a monadic function such as

T \to M U

, we have the following:

(M T, T \to M U) \to M U

Given the generator digits(Int) of type

M I n t e g e r

, then each entry of the Array<Int> is unwrapped in

M I n t e g e r

. In truth, the implementation of digits(Array<Int>) is equivalent to the use of the flatMap operation (i.e., SelectMany in C#) that can be denoted according to Listing 14:

Listing 14: flatMap-based implementation of digits(Array<Int>) generator using digits(Int).

fun digits(numbers: Array<Int>) = numbers.flatMap { nr -> digits(nr) }

3.3.3. Typed Generators

A generator type encloses the types

P_{i}

of its parameters and the yield type Y of the yielded values, also called the output type [18] that is

(P_{1}, \dots, P_{N}) ⇝ (Y)

. For digits(Int) and digits(Array<Int>) generators we have the following:

Int ⇝ Int

Array<Int> ⇝ Int

4. Functional Yield-Based Traversal Pattern

We choose the Kotlin type system [20] to describe the types of the yield-based traversal pattern because it supports first-class function types. For example, in Java, interface instances cannot be invoked as functions and must use the dot operator (i.e., .) to call an instance method. In contrast, in Kotlin, an instance of an interface (e.g., itf) with an invoke method marked with the operator keyword can be called directly as a function (e.g., itf()).

Next, we introduce the types and design that support the functional yield-based traversal pattern, along with a simple example of implementing and using a Cullen generator. After that, in Section 4.2, we demonstrate the composition of generators complying with the stackful property stated in Section 3.3. We finish in Section 4.3 with the implementation and composition of generators to solve the same-fringe problem.

4.1. Design

The functional yield-based traversal pattern is supported by the Advancer type, which defines how the elements of a sequence are traversed, and by the Yield type, which specifies how values are emitted back to the caller. In Figure 1, we depict the Advancer and Yield functional interfaces. The generator parameters are not shown in Figure 1; they are captured from the lexical scope of the generator function (closure).

The functional interfaces of Figure 1 allow expressing a generator as a lambda in the form: Advancer {yield -> … yield(item) …}. An Advancer function is expected to yield the next element of the sequence, if available, and returns true or false depending on whether an element was processed. The yielded element is delivered to the associated Yield instance.

In essence, an Advancer combines the behaviors of next() and hasNext() from the Iterator design pattern [16] into a single subroutine. For example, to traverse all elements of an Advancer, one can use a loop such as for or while, as shown in the following statement that traverses a hypothetical Advancer<T> adv and prints all its elements:

while(adv(::println)){ /∗empty ∗/}. Here, the expression adv(::println) invokes the Advancer with a Yield function implemented by the Kotlin function reference ::println, which receives each element produced by the Advancer and prints it to the standard output. The loop terminates when the Advancer has no more elements to produce, returning false.

In Listing 15, we show how to implement a Cullen generator using the Advancer idiom, and in Listing 16 we provide an example that prints the first 10 elements produced by this generator. The lambda expression in lines 3 to 7 of Listing 15 defines an instance of Advancer that receives a Yield instance as a parameter. Each time the Advancer produces an element, it invokes yield(...) to pass the value to the consumer.

Listing 15: Advancer of Cullen numbers.

Listing 16: Consuming the Cullen advancer.

val cullen: Advancer<Int> = cullen()
var limit = 10
while (cullen(::println)) {
   if (--limit < 1) break
}

Notice that these implementations do not require any compiler instrumentation or support, unlike the native Kotlin yield, to handle suspension points. Our approach relies solely on higher-order functions and the ability to define local functions (i.e., lambdas), which capture their free lexical variables as closures [29].

An Advancer represents a delimited subroutine that defines the boundary of a generator and scopes the action of yield, in accordance with the Delimited property described in Section 3.3. The argument of the Advancer function is an opaque computation capable of yielding values. This naturally suggests a monadic encapsulation of the effectful generator computations, with yield as the sole effect operator of the monad. Since Advancer marks the boundary of this effect, it can serve as the operation that escapes the monad.

4.2. Composition

Given the yield-based traversal definition of Advancer, we may easily implement extensions for Advancer that perform some of the basic operations of any streams library, such as filter, map, and limit that are used, for example, to compose the pipeline presented in Listing 1. Those operations are implemented in Listing 17 as extension methods of Advancer<T>. Nevertheless, depending on the programming environment, the same extensions could be implemented as instance methods of the Advancer. For example, in Java, the Advancer can be defined as a functional interface with these intermediate operations as default methods.

Listing 17: Advancer extensions for filter, map, reduce and take.

Each Advancer returned by each operation filter, map, and limit of Listing 17 encloses the generator boundary that captures the generator parameters (i.e., pred, mapper, and size). To advance on the upstream, we may find a call to invoke(…) on lines 3, 13, and 20. For the limit operation, we simply pass the yield to the upstream because there is no transformation of the stream elements. On the other hand, the map needs to pass the element through the mapper before yielding it. For the filter operation, we create a new Yield instance that sets the found flag (line 6) whenever an element satisfies the predicate. Then, the filter generator repeatedly advances the upstream until an element is found or the upstream is exhausted.

We may also define the flatMap operation (i.e., flat morphism) that is useful to compose generators, such as the example of digits(int) and digits(int[]) of Listings 12 and 14. The implementation in Listing 18 defines a new Advancer that produces elements by flattening a sequence of sequences generated from an upstream source. Initially, src is set to an empty Advancer that always returns false (line 2). The returned Advancer repeatedly calls src(yield) (line 4), which attempts to yield the next element from the current inner sequence. This process continues while elements are successfully yielded from src, at which point the Advancer returns true (line 9). If src is exhausted, it advances the upstream by invoking invoke { item -> src = mapper(item) } (line 5), which applies the mapping function to produce a new inner sequence and assigns it to src. If the upstream itself is exhausted, the Advancer returns false, signaling that no more elements are available (line 6).

Listing 18: Advancer extension for flatMap.

Given the definition of flatMap() and an auxiliary toAdvancer() builder that constructs an Advancer from an array, we can define digits(Int) and digits(Array<Int>) following the implementations shown in Listings 19 and 20.

Listing 19: Advancer implementation of digits(int) generator.

fun digits(nr: Int): Advancer<Int> {
   var count = nr
   return Advancer { yield ->
      count > 0 && run {
         yield(count % 10)
         count /= 10
         true
      }
   }
}

Listing 20: Advancer implementation of of digits(Array<Int>) generator that combines the use of Digits(int).

The generator produced by the digits(Int) call (line 3 of Listing 20) becomes the Advancer assigned to the auxiliary src in flatMap (line 5 of Listing 18), which receives the yield parameter of the outer Advancer on line 4 (i.e., src(yield)). This Yield<T> instance captures the context that is preserved across calls to different generators, thereby fulfilling the stackful property described in Section 3.3.

4.3. Tree Traversal

The same-fringe algorithm is a tree traversal use case that checks whether two binary trees have exactly the same leaves. Remember from Listing 9, that same-fringe subroutine can be expressed with a pipeline of leaves-zip-all.

Starting with leaves, we present in Listing 21 our equivalent implementation of an Advancer-based generator. This implementation follows Henry Baker’s generator-composition style by constructing the leaf-traversal Advancer entirely through nested closures, where each generator yields its own contribution and then delegates to the next stage. The function leaves initiates the process by calling gen on the root and supplying an empty Advancer as the final continuation (line 2). The helper gen recursively builds the traversal: For internal nodes, it composes generators so that the left subtree’s generator is produced first and, once exhausted, seamlessly resumes with the generator for the right subtree (line 9), and only then with genRest, thus creating a left-to-right pipeline of generator stages. For leaf nodes, gen returns an Advancer that yields the leaf value on the first invocation (line 18) and, on subsequent calls, delegates to the continuation stored in rest (line 21), effectively chaining generators through closure-captured state.

Listing 21: Advancer implementation of the Node<T> leaves.

In Listing 22, we present an Advancer-based implementation of the zip operation, which applies the provided zipper function to corresponding elements from the upstream and the other sequence, producing a new sequence of combined results. The invocation of other on line 5 returns a Boolean value, as prescribed by the Advancer protocol, indicating whether the right-hand stream successfully yielded an element. This Boolean is used to update the yielded variable, allowing the resulting Advancer to correctly report whether it has produced an output value (line 9). When the upstream has no further elements, the inner lambda is never executed, and yielded remains false (line 3), ensuring that no value is produced. Consequently, a result is emitted only when both the upstream and the other Advancer advance successfully, in which case the zip operator applies the zipper function and sets yielded to true.

Listing 22: Zip operation for Advancer based sequences.

Again, our Advancer-based implementation is much more concise than its Java counterpart. For example, the accepted answer to the question “Zipping streams using JDK8 with lambda” (https://stackoverflow.com/a/23529010/1140754, accessed on 1 February 2026) gives an implementation with more than 30 lines of code.

5. Related Work

5.1. Background

Lazy traversal is inspired by the concept of lazy lists, also known as streams, first described in 1965 by Landin [30]. It was Landin who proposed the use of delayed evaluation to avoid an “item-by-item” representation of collections. Friedman and Wise [3] introduced lazy lists in Lisp in 1976, and the idea was then adopted in other languages too, either as a fundamental data structure, as in Haskell [31].

Alphard, developed at CMU in the late 1970 was the first programming language to introduce the generator operator [32]. That construct inspired iterators in CLU [25] as a procedure that returns a sequence of elements, which allows one to get at the elements one at a time.

The idea of a single iteration method was introduced in Python 2.2, where iterators provide a single method next that returns the next element in a sequence, or raises an exception when no more elements are available [33]. This feature is described in the proposal PEP 234 (Python Enhancement Proposal 234), Iterators [33]. The advantages of a single traversal subroutine were highlighted in [17], where H. Baker shows how higher-order functions, taking as arguments functions which are closed over their free lexical variables (closures), can be used to provide iteration capabilities.

In previous work [34] Prokopec has shown that coroutine approach is 2× slower than an iterator to solve the same fringe. This is according to our observations of yield-based approaches, where we have observed a degradation in performance between two and three times, as presented in the results of Section 6. For comparison, Prokopec [34] also has observed that lazy functional lists are 12–17× slower. We have also experimented with the same behavior in Java using the Vavr library [10], which is a purely functional, immutable data structure, and StreamEx [15]. The Vavr and StreamEx approach to user-defined operations uses a cons [3] in conjunction with the head method and a supplier to produce the new tail of the sequence, applied recursively.

In our previous work [35], we investigated the internal mechanics of the yield operator to improve the performance and expressiveness of stream processing. We deconstructed the conventional yield primitive and analyzed how it can be leveraged to implement custom stream operations more efficiently. By exposing the underlying control flow and laziness semantics, we showed that yield-aware implementations can reduce overhead and enable more flexible composition of stream pipelines.

Møller and Veileborg [6] investigate how ahead-of-time optimizations can remove much of the dynamic machinery that ordinarily accompanies Stream execution. Their technique relies on specializing pipelines based on the operations they contain, allowing many intermediate objects and dispatch steps to be avoided entirely. The study shows that when the pipeline structure is known in advance, substantial speedups are attainable without changing the Stream API itself.

Complementing this line of work, several empirical studies have focused on understanding how Streams are used in practice. Rosales et al. [7] performed a large-scale analysis of open-source projects, identifying common usage patterns, typical pipeline lengths, and the frequency of specific higher-order operations. Their findings highlight that Streams are often used in relatively simple scenarios, suggesting that general-purpose optimizations could exploit recurring structural regularities.

Another strand of research examines parallel Stream processing. Basso et al. [8] revisit the design of parallel pipelines and show that existing implementations do not always make efficient use of available cores, particularly when operations have small computational weight. Their proposed improvements focus on reducing synchronization costs and better balancing workloads across workers.

Rosales et al. [9] explore profiling techniques tailored to Stream programs. They demonstrate that fine-grained visibility into the execution of pipeline stages can guide optimizations that are difficult to discover otherwise. Their work emphasizes the importance of tooling support, not only for optimization but also for helping developers understand the performance consequences of their design decisions.

5.2. Alternative Stream Libraries on the JVM

The backbone of Java Streams is the Sink interface, which extends Consumer and represents the intermediate operations (“stages”) that form a stream pipeline. Each stage implements its processing logic in the accept() method inherited from Consumer. The class ReferencePipeline provides the concrete implementation of Stream and is responsible for assembling these stages. Each stream operation is implemented either directly in ReferencePipeline or indirectly through specialised helper classes such as DistinctOps or ReduceOps. When a terminal operation is invoked, the pipeline is materialised by wrapping the Sink chain from the terminal stage back to the source; once the chain is composed, the source elements are traversed and handed sequentially to the wrapped Sink.

Spliterator plays the role of an iteration mechanism over a data source, analogous to Iterator but with built-in support for efficient parallel traversal. Its ability to split the input enables Java Streams to decompose work for parallel execution while still supporting sequential processing.

The design of JOO $λ$ [11] closely mirrors that of Java Streams, as its Seq type wraps a standard Stream. Consequently, it inherits both the strengths and limitations of Java’s design, with the added drawback of requiring an external dependency. StreamEx [15] is likewise compatible with Java Streams but adopts a more functional style for defining custom operations, relying on the headTail pattern to recursively compose new heads and tails of the sequence.

Vavr [10], in contrast, does not interoperate with Java Streams and instead provides its own collection hierarchy based on purely functional data structures. Its construction of user-defined operations is conceptually similar to StreamEx but uses the cons operation together with the current head and a supplier for the recursively computed tail.

Eclipse Collections [12] relies on Iterable-based traversal, where each operation produces a new Iterable that wraps the previous one. The library includes numerous optimisations that exploit the nature of the underlying data source; for example, when an array is the source, iteration achieves performance comparable to a hand-written for loop.

Kotlin’s approach is centred on the Sequence interface, which resembles Iterable but enables lazy evaluation. A Sequence must provide an iterator, typically implemented using a standard Iterator, but the language also offers a yield-based generator syntax that allows custom sequence operations to be expressed in a concise and declarative manner.

6. Performance Evaluation

The source code of our tests, named treebench, is available on GitHub at https://github.com/tinyield/treebench-jvm (accessed on 1 February 2026). To obtain the most unbiased and precise results, we based our benchmark on a state-of-the-art performance analysis platform, namely JMH [23] in Java. The runtime used was the OpenJDK Runtime Environment Corretto-21.0.2.13.1 (build 21.0.2+13-LTS). Tests were performed using the default JVM configuration and the following JMH arguments, which our experiments showed to yield stable and consistent results:

-i 4—Four measurement iterations performed by JMH after warmup.
-wi 4—Four warmup iterations prior to measurement.
-f 1—A single forked JVM instance for benchmark execution.
-r 2—Two seconds of measurement time per iteration.
-w 2—Two seconds of warmup time per iteration.

All results presented in this section were obtained from experiments executed on a local machine with the following specifications: an Apple M1 Pro with 8 cores (six performance cores and two efficiency cores) and 16 GB of memory. Nevertheless, the treebench-jvm repository provides a GitHub Action workflow configured with three different GitHub-hosted runners: ubuntu-latest, macos-latest, and windows-latest. The macOS runner uses the arm64 architecture, while the others use x64; all are provisioned with 4 CPUs and 16 GB of memory. Running the benchmarks on these virtual machines yields results consistent with those reported in this paper. The use of standard GitHub-hosted runners is free for public repositories, and the treebench-jvm repository can be forked to reproduce the experiments using any GitHub account.

To evaluate different approaches in tree traversal scenarios, we selected two types of binary tree implementations: a binary leaf-tree [36] and an AVL tree [37]. The former allows the construction of trees with identical values arranged in different topologies while preserving the same fringe. The latter is a self-balancing binary search tree that enables fair partitioning for traversal parallelization.

For all tests, we implemented an ad hoc version of the algorithm without relying on any external library, using only the native programming language features such as loops and recursion. This approach provided the highest performance and serves as the baseline for comparing the relative performance of each alternative.

Our performance tests were conducted on trees containing 10 million elements, a threshold above which performance differences became more apparent.

JMH reports results as average execution time together with an error term that corresponds to a 99.9% confidence interval around the mean. The relative error can therefore be interpreted as a measure of variance and result stability. Across the evaluated workloads, most implementations exhibit low relative error, typically below 3% of the reported mean. For example, in the Same Fringe benchmark, Advancer reports

(45.6 \pm 0.31)

ms (≈0.7%), Java Streams

(63.9 \pm 1.0)

ms (≈1.6%), and Guava

(73.2 \pm 0.8)

ms (≈1.1%). Similar behavior is observed in Distinct and AVL Equality, where relative errors for most libraries remain between 1% and 5%.

Higher variance is observed only for Vavr, where confidence intervals can exceed 20–30% of the mean (and even higher in the SUM workload), indicating unstable measurements rather than systematic performance advantages. Excluding these outliers, the average relative error across benchmarks is approximately 2–4%, suggesting that the observed performance differences are robust and not attributable to measurement noise. Due to its unusually high variance, Vavr was excluded from the summary table analysis presented in Section 7. All subsequent results should therefore be interpreted as statistically stable within the reported confidence intervals.

6.1. Same Fringe

To populate the binary leaf-trees, we first generated an array with 10 million integers, and then we shuffled the array before inserting its elements into each tree. Hence, we obtain two distinct binary leaf-trees with different topologies, while keeping the same fringe.

Each evaluated approach uses a data source based on a leaves() implementation compatible with the corresponding streams API. Thus, for Advancer we used a leaves-based implementation according to the generator of Listing 21; for Kotlin we used the generator of Listing 10; and for the rest of the Java libraries, including standard Java streams, we used an Iterable implementation of a tree traversal similar to the Java solution illustrated in the StackOverflow question “In-order iterator for binary tree” (https://stackoverflow.com/a/12851421/1140754, accessed on 1 February 2026).

The performance results presented in Figure 2 display relative throughput as a percentage compared to the ad hoc baseline implementation, which represents 100% throughput. Each bar shows how efficiently a library performs the Same Fringe operation—higher percentages indicate better performance closer to the baseline. For example, a library at 50% means it takes twice as long as ad hoc to complete the same task. All Java-based stream alternatives exhibit broadly similar performance, with the notable exception of Vavr, which consistently performs the slowest.

Kotlin is the only JVM-targeting language that provides a native yield primitive, and the associated overhead becomes visible in our measurements. This cost arises from the way suspending functions are compiled: the Kotlin compiler transforms them into state machines [20], where each state corresponds to a continuation representing the code to be resumed after a suspension point. In the case of a Kotlin sequence built with the yield suspending function, the lambda passed to sequence becomes an anonymous function whose body is compiled into such a state machine, and this transformation accounts for the observed performance degradation.

The Vavr performance degradation is due to the internal manipulation of lazy functional lists to implement intermediate operations such as zip.

6.2. AVL Tree

We have used the implementation of an AVL tree provided by the Apache Commons Math library. This implementation maintains the elements in sorted order while allowing duplicates, and the tree is rebalanced whenever an element is inserted or removed. Since our experiments allow duplicate values, each AVL tree is populated with 10 million randomly generated integers in the range

[0, 20)

. As a consequence, collecting the distinct values stored in the tree yields a sequence containing exactly 20 elements.

We have implemented three different pipelines to test the performance of each approach in pre-order traversal of the AVL trees:

distinct—implemented with a pipeline of preOrder-distinct-toList, where preOrder produces a sequence that traverses the tree nodes in pre-order and the toList collects the elements of the resulting stream into a list.
sum—chains the pre-order traverser with a reduce operation (also known as fold) to sum the values of all nodes.
equality—checks whether two AVL trees have the same values of corresponding nodes in pre-order traversal. This is similar to the pipeline for solving the same-fringe algorithm, but replacing leaves with the pre-order traverser.

Again, the implementation of the auxiliary extensions reduce and toList is quite simple using the Advancer idiom and is expressed according to Listing 23.

Listing 23: Advancer Kotlin extensions for reduce and toList.

fun <T> Advancer<T>.reduce(seed: T, accumulator: (T, T) -> T): T {
   var prev: T = seed
   while (invoke { curr -> prev = accumulator(prev, curr) }) { /* no op */ }
   return prev
}
fun <T> Advancer<T>.toList(): List<T> {
   val data: MutableList<T> = ArrayList()
   while (invoke { item -> data.add(item) }) { /* no op */ }
   return data
}

The three benchmarks reveal distinct performance characteristics driven by their computational complexity, with each benchmark placing different demands on the traversal implementations. Nevertheless, the results shown in Figure 3 exhibit performance differences between the various approaches that are consistent with those observed in the corresponding fringe benchmark.

The sum benchmark, being the simplest operation involving pure aggregation, achieves the fastest absolute baseline and also exhibits the largest performance gaps between the ad hoc implementation and alternatives. In contrast, the distinct benchmark shows the most competitive landscape across libraries, while the equality benchmark demonstrates moderate performance gaps.

In equality and distinct benchmarks, the advancer approach consistently achieves the best performance among alternatives, maintaining between 54% and 75% of ad hoc throughput depending on the operation. This consistency suggests that the advancer pattern provides a robust foundation for a functional yield-based traversal that scales well across different computational patterns. The equality benchmark represents the advancer’s strongest showing, with 75% relative throughput, likely because the dual-traversal pattern with early termination aligns well with the advancer’s control flow design.

The relationship between benchmark complexity and performance dispersion reveals important insights into iterator implementation costs. The sum benchmark, despite being computationally the simplest, shows the largest relative performance gaps, with alternatives ranging from 17% down to just 1% of ad hoc throughput. This counterintuitive result suggests that iterator overhead becomes most visible in tight reduction loops, where the ad hoc implementation can leverage cache locality and minimal abstraction layers. Moreover, the arithmetic operations in the sum benchmark require boxing and unboxing on every iteration of the stream pipeline, which strongly penalizes its performance. Using a primitive-specialized stream implementation would significantly reduce the gap to the ad hoc approach. Nevertheless, the differences across the various approaches remain consistent, with the advancer implementation significantly outperforming the alternatives.

The Vavr library exhibits critical performance issues that vary dramatically by benchmark type. In the equality benchmark, Vavr achieves only 5% of ad hoc throughput, while in the sum benchmark, it degrades to just 1% of baseline performance. This dramatic variation suggests that vavr’s functional abstractions and lazy evaluation strategy impose particularly high overhead on operations that require early termination or tight reduction loops.

7. Discussion

Most benchmarks [5,6,35,38] evaluating stream pipeline performance are based on sequential traversal of in-memory data structures, which typically require straightforward iterator implementations. While these benchmarks are useful for measuring the overhead of stream abstractions in simple linear traversals, they do not capture the challenges of more complex data structures, such as trees, where traversal requires visiting nodes along branches. For a balanced binary tree, common traversals such as pre-order or in-order use recursion and branching, leading to irregular memory access patterns.

Moreover, the tree traversal use cases explored in our tests required implementing custom operations that are not provided out of the box, such as stream() and asSequence(), which are already available for standard collections in Java and Kotlin. The need for custom tree traversals motivated exploring different ways to extend the available iterator protocols. Note, for example, that the Streamliner tool [6], which performs ahead-of-time optimizations of Java stream pipelines, is limited to concrete in-memory data sources and does not provide optimization benefits for data sources built from custom Spliterator implementations, such as the tree-traversal sources evaluated in this work.

Since trees are widely used to represent hierarchical data—such as databases, file systems, DOMs, and syntax or decision trees—and there is a lack of experimental benchmarks in this context, this work provides strong motivation to analyze these workloads.

The Java Streams package was the only library studied that provided built-in parallel processing capabilities. Most of the more recent technologies, such as Kotlin Sequences, do not offer this feature. Although there is extensive discussion online—e.g., on Stack Overflow—about the potential benefits of sequence parallelization, very little of it is backed by concrete benchmark data.

However, to exploit parallel stream pipelines with user-defined operations—such as leaves or preOrder—it is necessary to implement custom Spliterators, which can be non-trivial to code, as illustrated by Holger Brands (https://stackoverflow.com/a/48150478/1140754, accessed on 1 February 2026). Consequently, the additional programming effort may not translate into a clear performance gain.

We implemented a parallel pre-order traversal based on Holger Brands’ Java Spliterator proposal, enabling the evaluation of AVL tree workloads using Java Streams. The results in Table 1 show that the parallel implementation outperforms sequential approaches on some workloads, but it does not even come close to doubling the performance of the ad hoc implementation on the same 8-core runtime, which is disappointing given the available processing capacity. Moreover, parallel streams remain nearly twice as slow as ad hoc for the sum workload, where synchronization, splitting, and boxing overhead dominate.

For the Same Fringe benchmark, we did not implement a parallel leaves(), as it is not feasible to create a Spliterator that guarantees effective splitting for unbalanced or irregular trees.

The results in Table 1 highlight the performance spectrum of the studied sequential implementations. Across all workloads, the Advancer consistently delivers the best sequential performance, achieving roughly 50–75% of the ad hoc baseline for traversal-heavy pipelines, such as Same Fringe and AVL Equality, while the sum workload represents a notable exception due to tight reduction loops, where Advancer reaches only about 17% of ad hoc performance. The gap between Advancer and the worst mainstream alternatives—typically Kotlin sequences—ranges from 2× to 3.5×, illustrating the substantial overhead introduced by state-machine-based coroutines or iterator materialization. These results confirm that the yield-based Advancer approach provides a robust, high-performance foundation for implementing custom stream-pipeline operations without incurring the penalties common in other sequential libraries.

8. Conclusions

Kotlin, a JVM-targeted programming language that provides the yield operator, shows performance degradation in tree-traversal tests when using yield-based approaches. However, avoiding the yield primitive leads to significant verbosity, as demonstrated by many iterator-based solutions, which are often several times longer than the equivalent yield-based implementations, such as those for the zip and in-order iterators.

In this work, we show that extending stream pipelines with user-defined operations can introduce either performance or verbosity overheads. To mitigate both issues, we propose a functional yield-based traversal pattern that enables concise, extensible, and efficient stream processing. Using this pattern, we implement core stream operations—such as map, filter, limit, flatMap, reduce, and toList—as well as user-defined operations like zip and leaves, all with minimal code and in a style similar to Kotlin’s yield operator. Importantly, our approach requires no compiler or instrumentation support and achieves superior performance compared to existing alternatives across multiple tree-traversal tests.

In several benchmarks, our implementation achieves performance close to that of an ad hoc baseline, which represents the most efficient sequential processing approach. These results complement our previous work [35], providing detailed performance insights on workloads with different tree structures and morphologies.

Overall, this work presents a viable alternative to third-party utility libraries for stream processing, showing that high performance can be achieved with minimal implementation effort and without sacrificing code clarity.

Funding

This research received no external funding.

Institutional Review Board Statement

This study did not involve human subjects or personal data; therefore, Institutional Review Board approval was not required.

Data Availability Statement

The data presented in this study are available in Github at https://github.com/tinyield/treebench-jvm (accessed on 1 February 2026).

Conflicts of Interest

The author declares no conflicts of interest.

References

Fowler, M. Collection Pipeline. 2015. Available online: https://martinfowler.com/articles/collection-pipeline (accessed on 14 November 2025).
Goetz, B. Java Streams—“Under the Hood”. IBM Dev. 2014. Available online: https://developer.ibm.com/articles/j-java-streams-3-brian-goetz/ (accessed on 14 November 2025).
Friedman, D.P.; Wise, D.S. CONS Should not Evaluate its Arguments. In Automata, Languages and Programming; Michaelson, S., Milner, R., Eds.; Edinburgh U. Press: Edinburgh, Scotland, 1976; pp. 257–284. [Google Scholar]
Wadler, P. Deforestation: Transforming programs to eliminate trees. In European Symposium on Programming; Springer: Berlin/Heidelberg, Germany, 1988; pp. 344–358. [Google Scholar]
Kowalski, T.M.; Adamus, R. Optimisation of language-integrated queries by query unnesting. Comput. Lang. Syst. Struct. 2017, 47, 131–150. [Google Scholar] [CrossRef]
Møller, A.; Veileborg, O.H. Eliminating abstraction overhead of Java stream pipelines using ahead-of-time program optimization. Proc. Acm Program. Lang. 2020, 4, 1–29. [Google Scholar] [CrossRef]
Rosales, E.; Rosà, A.; Basso, M.; Villazón, A.; Orellana, A.; Zenteno, Á.; Rivero, J.; Binder, W. Characterizing Java Streams in the Wild. In Proceedings of the 2022 26th International Conference on Engineering of Complex Computer Systems (ICECCS), Hiroshima, Japan, 26–30 March 2022; pp. 143–152. [Google Scholar]
Basso, M.; Schiavio, F.; Rosà, A.; Binder, W. Optimizing Parallel Java Streams. In Proceedings of the 2022 26th International Conference on Engineering of Complex Computer Systems (ICECCS), Hiroshima, Japan, 26–30 March 2022; pp. 23–32. [Google Scholar]
Rosales, E.; Basso, M.; Rosà, A.; Binder, W. Profiling and optimizing java streams. arXiv 2023, arXiv:2302.10006. [Google Scholar] [CrossRef]
Dietrich, D. Vavr—An Object-Functional Language Extension to Java 8. Technical Report, vavr.io. 2014. Available online: https://github.com/vavr-io/vavr (accessed on 14 November 2025).
Eder, L. JOOL—The Missing Parts in Java 8. Technical Report, jOOQ. 2014. Available online: https://github.com/jOOQ/jOOL (accessed on 14 November 2025).
Eclipse Foundation. Eclipse Collections. Technical Report. 2025. Available online: https://eclipse.dev/collections/ (accessed on 14 November 2025).
Bourrillion, K.; Levy, J. Guava—Google Core Libraries for Java. Technical Report, Google. 2009. Available online: http://github.com/google/guava (accessed on 14 November 2025).
Fox, D. protonpack—Stream Utilities for Java 8. Technical Report, Codepoetics.com/. 2014. Available online: https://github.com/poetix/protonpack (accessed on 14 November 2025).
Valeev, T. StreamEx—Enhancing Java 8 Streams. Technical Report. 2015. Available online: https://github.com/amaembo/streamex (accessed on 14 November 2025).
Gamma, E.; Helm, R.; Johnson, R.; Vlissides, J. Design Patterns: Elements of Reusable Object-Oriented Software; Addison-Wesley Professional Computing Series; Pearson Deutschland GmbH: Munich, Germany, 1994. [Google Scholar]
Baker, H.G. Iterators: Signs of Weakness in Object-oriented Languages. SIGPLAN OOPS Mess. 1993, 4, 18–25. [Google Scholar] [CrossRef]
James, R.; Sabry, A. Yield: Mainstream Delimited Continuations. In Workshop on the Theory and Practice of Delimited Continuations; Indiana University: Bloomington, IN, USA, 2011. [Google Scholar]
Odersky, M.; Spoon, L.; Venners, B.; Sommers, F. Programming in Scala, Fifth Edition: Updated for Scala 3; Artima Incorporation: New York, NY, USA, 2021. [Google Scholar]
Akhin, M.; Belyaev, M. Kotlin Language Specification. 2021. Available online: https://kotlinlang.org/spec/pdf/kotlin-spec.pdf (accessed on 1 February 2026).
Gamboa, M. The Managed Runtime Environment: Diving into the JVM with Kotlin; Leanpub: Victoria, BC, Canada, 2025; ISBN 978-989-33-6322-5. [Google Scholar]
Gabriel, R.P. The Design of Parallel Programming Languages. In Artificial Intelligence and Mathematical Theory of Computation: Papers in Honor of John McCarthy; Academic Press Professional, Inc.: Williston, VT, USA, 1991; pp. 91–108. [Google Scholar]
Shipilev, A. Java Microbenchmark Harness (The Lesser of Two Evils). 2013. Available online: https://shipilev.net/blog/2014/nanotrusting-nanotime/ (accessed on 1 February 2026).
Conway, M.E. Design of a Separable Transition-Diagram Compiler. Commun. ACM 1963, 6, 396–408. [Google Scholar] [CrossRef]
Liskov, B. CLU Reference Manual; Springer, Inc.: New York, NY, USA, 1983. [Google Scholar]
Griswold, R.E.; Griswold, M.T. History of the Icon programming language. In History of Programming Languages—II; Association for Computing Machinery: New York, NY, USA, 1996; pp. 599–624. [Google Scholar]
Borins, M.; Braun, A.R.; Palmer, R.; Terlson, B. ECMA-334 C# Language Specification, 5th ed.; ECMA: Geneva, Switzerland, 2006. [Google Scholar]
Thomas, D.; Hunt, A. Programming Ruby: The Pragmatic Programmer’s Guide; Addison-Wesley: Boston, MA, USA, 2007. [Google Scholar]
Landin, P.J. The Mechanical Evaluation of Expressions. Comput. J. 1964, 6, 308–320. [Google Scholar] [CrossRef]
Landin, P.J. Correspondence Between ALGOL 60 and Church’s Lambda-notation: Part I. Commun. ACM 1965, 8, 89–101. [Google Scholar] [CrossRef]
Jones, S. Haskell 98 Language and Libraries: The Revised Report; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
Shaw, M.; Wulf, W.A.; London, R.L. Abstraction and Verification in Alphard: Defining and Specifying Iteration and Generators. Commun. ACM 1977, 20, 553–564. [Google Scholar] [CrossRef]
Yee, K.P.; van Rossum, G. PEP 234–Iterators. 2001. Available online: https://www.python.org/dev/peps/pep-0234/ (accessed on 1 February 2026).
Prokopec, A.; Liu, F. Theory and Practice of Coroutines with Snapshots. In Proceedings of the European Conference on Object-Oriented Programming, Amsterdam, The Netherlands, 16–21 July 2018. [Google Scholar]
Poeira, D.; Carvalho, F.M. Deconstructing yield operator to enhance streams processing. In Proceedings of the ICSOFT 2021: 16th International Conference on Software Technologies, Paris, France, 6–8 July 2021. [Google Scholar]
Wong, C.K.; Nievergelt, J. Upper Bounds for the Total Path Length of Binary Trees. J. ACM 1973, 20, 1–6. [Google Scholar] [CrossRef]
Adelson Velskii, M.; Landis, E.M. An algorithm for organization of information. Dokl. Akad. Nauk SSSR 1962, 146, 263–266. [Google Scholar]
Kiselyov, O.; Biboudis, A.; Palladinos, N.; Smaragdakis, Y. Stream fusion, to completeness. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, Paris, France, 15–21 January 2017; pp. 285–299. [Google Scholar]

Figure 1. Class diagram of Advancer and Yield types.

Figure 2. Relative performance to the ad hoc implementation of the same fringe.

Figure 3. Relative performance to the ad hoc implementation of the AVL tree traversal for distinct, sum and equality pipelines.

Table 1. Performance comparison across workloads, excluding Vavr, which exhibits out-of-band values, and including parallelization using Java Streams. The relative performance of the worst, ad hoc, and parallel implementations is shown against the Advancer.

Workload	Parallel (ms)	Ad Hoc (ms)	Best (ms) Advancer	Worst (ms)	Worst	Worst /Adv	Ad Hoc /Adv	Parallel /Adv
Same Fringe	–	30.7	45.6	89.7	Kotlin	2.0×	67%	–
AVL Equality	92.1	130.1	172.7	599.4	Eclipse	3.5×	75%	53%
Distinct	41.4	76.2	142.0	375.2	Kotlin	2.6×	54%	29%
Sum	30.5	15.6	90.9	304.2	Kotlin	3.3×	17%	34%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Carvalho, F.M. A Functional Yield-Based Traversal Pattern for Concise, Composable, and Efficient Stream Pipelines. Software 2026, 5, 7. https://doi.org/10.3390/software5010007

AMA Style

Carvalho FM. A Functional Yield-Based Traversal Pattern for Concise, Composable, and Efficient Stream Pipelines. Software. 2026; 5(1):7. https://doi.org/10.3390/software5010007

Chicago/Turabian Style

Carvalho, Fernando Miguel. 2026. "A Functional Yield-Based Traversal Pattern for Concise, Composable, and Efficient Stream Pipelines" Software 5, no. 1: 7. https://doi.org/10.3390/software5010007

APA Style

Carvalho, F. M. (2026). A Functional Yield-Based Traversal Pattern for Concise, Composable, and Efficient Stream Pipelines. Software, 5(1), 7. https://doi.org/10.3390/software5010007

Article Menu

A Functional Yield-Based Traversal Pattern for Concise, Composable, and Efficient Stream Pipelines

Abstract

1. Introduction

2. Stream Pipeline Pattern

2.1. Operation Names

2.2. Composability: Method Chaining Versus Nested Functions

2.3. Eager Versus Lazy Evaluation

2.4. Access Approach: Pull Versus Push

3. Yield Programming Model

3.1. Generator Operator Yield

3.2. Same-Fringe Use Case

3.3. Generator Properties

3.3.1. First-Class Generators

3.3.2. Stackful

3.3.3. Typed Generators

4. Functional Yield-Based Traversal Pattern

4.1. Design

4.2. Composition

4.3. Tree Traversal

5. Related Work

5.1. Background

5.2. Alternative Stream Libraries on the JVM

6. Performance Evaluation

6.1. Same Fringe

6.2. AVL Tree

7. Discussion

8. Conclusions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI