A Semantic Framework to Debug Parallel Lazy Functional Languages

It is not easy to debug lazy functional programs. The reason is that laziness and higher-order complicates basic debugging strategies. Although there exist several debuggers for sequential lazy languages, dealing with parallel languages is much harder. In this case, it is important to implement debugging platforms for parallel extensions, but it is also important to provide theoretical foundations to simplify the task of understanding the debugging process. In this work, we deal with the debugging process in two parallel languages that extend the lazy language Haskell. In particular, we provide an operational semantics that allows us to reason about our parallel extension of the sequential debugger Hood. In addition, we show how we can use it to analyze the amount of speculative work done by the processes, so that it can be used to optimize their use of resources.


Introduction
Pure functional languages provide advantages such as polymorphism, higher-order functions, and the absence of side-effects. In the case of parallel functional languages, the use of higher-order functions and function composition simplifies the clear separation between coordination and computation, facilitating also the definition of skeletons [1][2][3][4][5][6][7][8][9][10]. Moreover, the absence of state avoids side-effects, simplifying the coordination between processes, as it is only necessary to specify the arguments to be communicated among processes. These characteristics allow for defining the coordination in a simple way.
The use of the functional programming paradigm makes it easier the implementation of parallel programs. However, such languages do not usually provide debuggers. Developing debuggers The structure of the rest of the paper is the following: in the next section, we comment on the basic aspects of Hood. Then, in Section 3, we present the parallel languages under consideration. Next, in Section 4, we introduce a common core language to deal with the basic aspects of both GpH and Eden. Afterwards, the semantics of Hood in GpH and Eden are introduced in Sections 5 and 6. Then, in Section 7, we prove that pHood does not modify the evaluation of the observed expressions. After that, Section 8 presents a method where pHood is used to analyze speculative work. Next, we discuss the results we have obtained. Finally, Section 10 contains our conclusions and lines of future work.

An Introduction to Hood
First of all, we review the main characteristics of Hood. More details about it can be found in [53]. As mentioned above, when a programmer has to debug an imperative program, it is possible to explore the evolution of any variable, showing not only its final value, but also all its intermediate values at each moment during the execution. That is, we can track the value of each variable along time.
Unfortunately, it is difficult to obtain similar tracking facilities in lazy functional languages. This problem has been treated over the years [38][39][40]. There are three reasons that justify this difficulty. First, there are not variables whose values change during the execution of the program. Second, lazy evaluation should be preserved even under tracing observations; that is, tracing observations should not modify the evaluation order of any expression. Third, tracers must deal with higher-order functions. Fortunately, Hood provides observations that are similar to those provided in classic imperative languages. By using Hood, any intermediate expression appearing in a program can be observed. In fact, if we also use GHood [54], not only can we observe its final value, but we can also observe the evolution in time of its evaluation degree.
Let us consider an example (originally introduced in [53]) to show how to use Hood. It is a simple example, so that it can be easily understood. However, it is also complex enough to illustrate the main ideas underlying Hood. The function to be analyzed computes the list of digits of a given natural number: digits ::Int ->[Int] digits =reverse . map('mod ' 10) . takeWhile(/=0) . iterate('div' 10) In the first line, we provide the type of the function. That is, it receives an integer as input, and it outputs a list of integers. The rest of the definition is a sequence of functions that are composed to obtain the final output, where the last function to be applied is reverse. For instance, in case we evaluate digits 3408, we will obtain the list 3: Let us remark that the first list produced is infinite because iterate generates an infinite list, where ('div' 10) is applied infinite times to the number calculated in the previous application of ('div ' 10). However, even though the list is infinite, only five elements are actually demanded. The rest of the elements are not needed to obtain the overall result. Thus, they are not computed. Hence, the underscore char is used to represent the rest of the list.
In case we want to use Hood to obtain as output the intermediate lists shown before, we should use observe (the basic combinator of Hood) to annotate the intermediate lists. The type of observe is the following: observe ::String ->a ->a The output returned by observe is its second input parameter. Thus, observe s a = a. However, in addition to computing such final results, it also creates a side effect, so that the value of a (together with the tag s) is written in a file. By doing so, this file can be post-processed after the program execution is finished, so that we can show the intermediate results to the user. Let us remark that observe uses lazy evaluation. That is, introducing observe does not modify the evaluation degree of a. In this sense, the computational effect of observe s is exactly the same as that of the identity function id. Due to the fact that observe does not modify the evaluation degree, Hood can handle infinite lists. In particular, it can handle the infinite list generated in the previous example after the application of function iterate.
Let us suppose that we want to observe the three intermediate lists appearing in the digits example. In this case, we only need to annotate the source code by including the observe function in the appropriate places: digits ::Int ->[Int] digits =reverse . observe "after map" . map('mod ' 10) . observe "after takeWhile" . takeWhile(/=0) . observe "after iterate" . iterate('div ' 10) The execution of digits 3408 will provide the expected output. Notice that iterate ('div' 10) is the first function to be applied to the input value 3408. Then, it is applied observe "after iterate", and so on. As we have introduced three observations, the side-effect will obtain three intermediate lists. For instance, observe "after iterate" will write to the log file the output of applying iterate ('div' 10) 3408.
As it can be expected in a higher-order language like Haskell, Hood can be used not only with simple structures (as shown in the previous example), but also to observe functions. For instance, observe "sum" sum (7:3:6:[]) allows for observing the function (Notice that in Haskell function application associates with the left. Thus, observe "sum" sum (7:3:6:[]) is equivalent to (observe "sum" sum) (7:3:6:[]). That is, the piece of information being observed is function sum itself, not only the output of the function application.) sum. In particular, it will show the output of any of its applications. In this case, it is applied a single time (that is, to the list 7:3:6:[]). Thus, it returns The previous output can be interpreted as sum is a function that outputs the value 16 when it receives as input 7:3:6:[]. In this case, values 7, 3, and 6 appear explicitly. The reason is that they were actually demanded to compute the final result. However, in case we perform the following observation: then the output will be as follows: Let us remark that function length does not need to demand the concrete values of the input list, only the number of elements. Thus, Hood observes a function that produces as output the number 3 when it receives as input a list containing three elements (without actually demanding the concrete values of the list).
As expected, it is also possible to observe higher-order functions. In this case, we only need to introduce the observation associated with the corresponding higher-order function. As an example, function iterate can be observed as follows: digits ::Int ->[Int] digits =reverse . map ('mod ' 10) . takeWhile (/=0) . observe "iterate" iterate ('div ' 10) Let us remind that iterate is a higher-order function that takes two arguments and returns an infinite list, applying the first function it receives an infinite number of times. For instance, the expression iterate (+5) 1 generates the infinite list 1:6:11:16:21:.... Notice that in this case observe is only applied to function iterate. Thus, we will observe its behavior each time it is applied. For instance, digits 3408 will now return: , \ 340 -> 34 , \ 3408 -> 340 } 3408 -> 3408 : 340 : 34 : 3 : 0 : _ } Thus, the observation shows that iterate is a function that outputs 3408:340:34:3:0:_ when the second input parameter is 3408 and the first input is another function ('div ' 10), where this input function was observed in four different cases: 3408, 340, 34, and 3.
Let us point out that not only is it necessary to analyze whether an expression was evaluated or not, but it is also necessary to know who was responsible for such evaluation. For instance, in case a structure is being observed in a certain environment, but the same structure can also be demanded from a different environment, we are only interested in recording the demand due to the environment under observation. As an example, the following observation of function length As expected, all the elements were demanded to evaluate sum, but the observation records that length did not demand any of them.

Implementation Details
Next, we comment some implementation details of Hood that are relevant to understand how we will define appropriate semantic rules in the following sections. Hood's implementation produce annotations of the form (portId, parent, change). The first component (portId) points to the place where the annotation is made. The second component (parent) is needed to know the context where an expression was evaluated. For instance, when a function is evaluated, its arguments need to know the place where they were invoked. That is, it is necessary to access to the parent of those arguments. More precisely, parent is a tuple (observeParent, observePort), where observeParent is the portId of the parent and observePort is the position of the argument. The third parameter (change) informs about the kind of observation that is being done. It has the following possibilities: Observe String is created when entering in a binding that is an observation. This new observation has no parent. More precisely, its parent is the predefined general parent, denoted as (0, 0). As expected, when we start the evaluation of an annotated variable, this is the initial annotation that is created. Enter is created when starting the evaluation of a binding.

Cons Int String
is created when we evaluate a constructor. The first parameter represents the arity of the constructor, while the second one represents its name. As you might expect, the children of the constructor will receive annotations of the form (parentPortId, 1), . . . , (parentPortId, arity) where parentPortId is the pointer to the Cons annotation. By doing so, it is possible to reconstruct the constructor application. Fun is created when the binding evaluation arrives at a lambda expression. Only curryfied functions are considered, so lambda expressions have a single input value and a single result. When a lambda is applied to an input parameter, the argument is annotated with parent (parentPortId, 0), while the annotation given to the result is (parentPortId, 1), parentPortId being a pointer to the Fun annotation.
Let us remark that not only does Hood observe normal forms, it also records when an evaluation has been started. Thus, it is possible to know what binding has been demanded by other ones. When the computation finishes, the corresponding annotations are post-processed to provide the appropriate output to the user.

Introduction to GpH and Eden
Next, we present the basic ideas underlying the two parallel extensions of Haskell that we will use in the rest of the paper.

Glasgow Parallel Haskell
GpH [14,15,65,66] extends Haskell with simple annotations to indicate expressions that could be evaluated in parallel. It follows a thread-based approach. That is, programmers have some control to decide the parallel threads that are to be created, but they lack mechanisms to control the threads once they have been created. Threads are only handled by the underlying runtime system. The use of higher-order functions, combined with simple thread primitives, allows the programmer to create high-level abstractions. In particular, the use of evaluation strategies [65] has proven to be very useful in GpH.
The language provides two primitives for parallel (par) and sequential (seq) composition. From a denotational point of view, both primitives only return their second argument. However, from an operational point of view, e1 'seq' e2 starts forcing its first parameter (e1) to be reduced to weak head normal form, and then it starts the computation of (e2). By contrast, e1 'par' e2 first creates an annotation indicating that e1 could be evaluated using an independent parallel thread, and then it starts the computation of the second parameter. This process of annotating expressions that could be executed in parallel appears in many parallel languages, and is called the sparking of parallelism. Then, the runtime system can decide to ignore (or not) some of these annotations. That is, the programmer only suggests expressions that could be useful to be evaluated in parallel, but the runtime system manages all the low level decisions (creation of threads, synchronization, etc.).

Eden
Eden [24,[67][68][69] extends Haskell adding constructions to define process abstractions and to instantiate them. Any function can be transformed into a process abstraction by applying to it the predefined higher-order function process. The new process abstraction is similar to the original function it comes from, but the process abstraction can be instantiated to be executed in parallel. That is, from a semantics point of view, functions and process abstractions are analogous, the only differences appear when they are applied to their arguments. In this case, functions are applied with a function application (e1 e2), while processes are applied using a process instantiation (e1 # e2).
In Eden, a process is not a syntactical structure, it is a new computational environment that performs its computations autonomously. Each time a process instantiation (e1 # e2) takes place, the runtime system creates a new computational environment. The creator (also called parent process) of the new process will be responsible for sending the value for e 2 via an input channel, while the new process (also named child process or instantiated process) will receive such input value and will return to its parent (through an output channel) the result of evaluating e 1 e 2 .
In Eden, the communication between processes is done using pushing of information instead of pulling. That is, values are communicated even if the receiver has not demanded them (As you might expect, this rule introduces eagerness in the language. Thus, the programmer has to be careful to avoid creating unneeded work). Moreover, when a process has to send a value through one of its output channels, the corresponding values have to be fully evaluated before sending them. Streams are the only exceptions of this rule because they are sent element by element through the channels. However, each element of the stream has to be evaluated to full normal form before being transmitted. When a thread needs a value that is to be received through an input channel, and this value has not been received yet, the thread is temporarily suspended. This mechanism is the only one that can be used to synchronize Eden processes. Let us remark that the creation of processes in Eden is done explicitly, but the communication (and also the synchronization) between processes is done implicitly.

GpH-Eden Core Language
GpH and Eden are different extensions of Haskell, but they share a large part of their core languages. Thus, we will use a single framework (called GpH-Eden core) to deal with the common characteristics of both languages. The syntax of the common language can be seen in Figure 1. It is an untyped λ-calculus extended with case expressions, recursive lets, constructors application, and primitive values. It also includes expressions for the sequential and parallel compositions of GpH, as well as expressions to deal with Eden process instantiations. Any Eden or GpH expression can be translated into GpH-Eden core's expressions by using a normalization phase. This is done by introducing the corresponding let expressions, together with the required number of intermediary variables. Regarding Eden streams, they are handled as constructors ([x 1 : x 2 ] is a constructor of arity 2, Cons x 1 x 2 ). Thus, they can appear in case expressions. Notice that streams cannot be members of any other stream, but they can contain any other value.
--others Any variable can be annotated as observable by the programmer. Thus, any process abstraction can also be marked as observable. Let us remark that our core language includes two internal expressions, namely p @(n,m) and λ @[(n i ,m i )] x.e (λ-abstractions are annotated with a list of observers because it is possible to be observed from different points). These expressions cannot be written by the programmer. They can only appear as a result of applying other semantic rules. In this sense, they are auxiliary expressions that are useful to track who is responsible for each of the observations. Following other classical approaches used for parallel languages (see, e.g., [64,70]), our semantics uses two levels of transition systems: the lower level is in charge of the local behavior inside each of the processes, while the upper level deals with global effects that affect several processes. In the lower level, GpH and Eden behave analogously, the only difference being that Eden does not distinguish between active and runnable threads. At a local level, the functional computations take place, such as the β-reduction, the reduction of the case, letrec, etc. At a global level, the global coordination behavior of the system is described with rules such as process creation, process communication, etc.
Following [64,70], we model the evaluation state of a process with a Heap: a set of bindings of variables to binding expressions. We consider that each binding can be a potential thread, and we associate a label that indicates its current state: p α → e, being α ::= I|A|B|R, where the different possibilities correspond to the following meanings:

I:
Inactive. It has not been demanded yet, or its evaluation has already finished. A: Active. It has already been demanded and it is under evaluation. B: Blocked. It has been demanded, but it is currently waiting to receive a value from another binding. R: Runnable. It has been demanded and it is not waiting for any data, but it is not Active because it lacks an available processor.
Active and Runnable are only different in the context of GpH. The reason is that the original semantics of GpH distinguishes them, but not that of Eden. In fact, in GpH, threads are only activated in case there is an idle processor, while Eden's scheduler is quite different in this aspect.
For the sake of conciseness, the semantic rules allow for including several labels in each binding.
Each of them will represent the different possibilities that the rule admits. For instance, when p I AB → e is in the left part of a rule, while p ABA → e appears in the right part, it means that, if the thread associated with binding p → e was inactive or blocked, then it becomes active. Analogously, in case it was active, then it becomes blocked. We denote by dom(H) to the set that contains all the variables appearing in the left-side of any binding of the heap. Moreover, H + {p α → e} represents the extension of the heap H with the binding p α → e. Finally, H : p α → e indicates that the guiding binding is the one corresponding to p, that is, it is the binding that guides the application of the corresponding rule. In the previous two situations, the condition p ∈ dom(H) is assumed.
When the execution of a program e starts, H 0 = {p main A → e} is the starting heap, where p main is assumed to be a fresh variable (pointer). Thus, p main does not appear in e.
The meaning of p B → e is that this binding is blocked, that is, it has to wait until other computation generates certain value. We will say that expression e is blocked on a variable x, and it will be denoted by e ∈ ble(x), if it has one of the following forms {x, x y, case x of alts, x @str , x @(n,m) , x 'seq' y}.
From now on, we will use x, y, p, q, l, t, ch, ch o , ch i ∈ Var for ordinary variables, x, y denote program variables (written by a programmer), while p, q, l, t are dynamically created free variables (that we will call pointers), ch i corresponds to input channels in Eden, ch o represents output channels, and ch can represent any Eden channel. In the case of weak head normal forms, we will represent them using w. Moreover, in an abuse of notation, we will also use w for primitive values. The notation P j represents that a given pattern P is repeated several times, indexing such repetitions with variable j. For instance, letrec x i = be i in e will stands for letrec x 1 = be 1 , . . . , x n = be n in e.

Local Transitions
Local transitions are classified into two groups. The first one corresponds to ordinary expressions (see Figure 2), while the second one corresponds to Hood observations (see Figure 3). We start commenting on Figure 2, whose rules deal with the lazy evaluation of expressions. In a lazy context, if an expression is to be evaluated, it has to be demanded. This demand is represented by active bindings. That is, the guiding binding is always an active binding. A binding that is active can bind a variable to a value (meaning that the computation was already completed), to other variable, to an application, to a let-expression or to a case-expression. When a binding is active and binds a variable to other variable, there are two possibilities:

•
If the two variables are equal, then we must block the corresponding variable, as we have entered a blackhole (rule blackhole).

•
If the two variables are not equal, then there are two new possibilities: -When the second variable is bound to a value that has already been evaluated, we have to copy such value to the former variable (rule value).
-When the second variable is bound to an expression that is not completely evaluated yet, the first variable has to be blocked (waiting until the second one is completely evaluated). Moreover, if the second variable was inactive, then it has to be turned into runnable (in the case of GpH) or into active (in the case of Eden) (rule value-demand).
Two possibilities appear when a variable is bound to an application, depending on whether the variable corresponding to its body is bound to a λ-abstraction or to a non-whnf expression. The first situation must continue with a β-reduction (rule β-reduction), while, in the other situation, it is necessary to evaluate the body. That is, we turn into runnable the corresponding binding (rule app-demand).
The situation is analogous when we have to evaluate a case-expression. We have to perform the reduction when the variable is associated with a constructor (rule case-reduction), while we have to demand its evaluation in any other case (rule case-demand).
Regarding let-expressions, when we evaluate them, we have to add the corresponding new bindings (rule letrec).

Local Rules with Observations
The rules shown in Figure 3 define the evolution of observation marks in the context of lazy evaluation. Notice that local transitions are including now a file to track the observations related to the bindings. This file will be post-processed afterwards to show the appropriate results to the user. Data are always appended to the file, without modifying data previously added. Hence, f • ann denotes that annotation ann has been appended to file f . Let us remark that rules in Figure 2 should also handle the corresponding files. However, for the sake of clarity, we prefer to ignore the files there because they never modify any file.
Our files will contain annotations as follows: These annotations are nearly the same as those generated by Hood, but there are two differences. First, we do not include the portId corresponding to the annotation. The reason is that we can obtain this information from the line number in the file. Notice that observeParent is a natural number representing the line corresponding to the annotation of the parent. Thus, observePort is also a natural number. Function length f will be used to compute the total amount of lines of the file f , 0 being the first line of the file. The second change affects the annotations of λ-abstractions: in our case, we use a list of pairs observeParent and observePort representing all the bindings that need to observe such λ-abstraction. This modification allows us to simplify the process of observing the same λ-abstractions from different points.
Next, we describe the observation rules: Rule observ.
In this case, we start an observation with the string str. Thus, an annotation 0 0 Observe str is added to the file, and then the evaluation goes on, but taking into account the annotation that points to its parent, that is (length f , 0). Rule value@L, value@LO.
In case we deal with an active binding p A → q @(n,m) where q is bound to a function, we have to continue evaluating a new kind of expression. If the function is previously being observed, then it is necessary to add the new observation mark to that function (rule value@LO) λ @(n,m):obs x.e; otherwise, if the function has not been observed (rule value@L), then the new λ-abstraction λ @[(n,m)] x.e is created. This kind of lambda indicates that it is under observation associated with the tag @[(n, m)]. In addition, a new annotation n m Enter is generated, indicating that we enter to evaluate that binding. Rule value@C.
In case q @(n,m) is evaluated to a constructor, we have to generate a new annotation n m Cons k C . This indicates that the binding whose parent is (n, m) has been reduced to the constructor C (whose arity is k). New bindings pointing to each argument of that constructor are generated. These bindings are annotated to indicate that they are being observed. Moreover, in this annotation, we must indicate its position in the constructor and that its parent is in the corresponding line of the file. Rule blackhole@.
In this case, we have an annotated binding whose reduction needs to access to itself. Thus, we have to block the binding. Rule value@demand.
When dealing with a binding p A → q @(n,m) , we have to generate a new annotation n m Enter to record that we have started its evaluation. Rule fi-reduction@.
This is the key rule to deal with the observation of functions. In case we have to evaluate an application of a function that is under observation, we generate the annotation in the file indicating that we are applying an observed function. Then, we mark its argument as observable, and we use (length f , 0) as its parent. In order to observe the result, we create a new observed binding whose parent is (length f , 1). The ports are different to remember that one is the argument and the other is the result of the lambda.
Note that it is not necessary to specify the application to an observed pointer e p @(n,m) . The reason is that, in the syntax, we have restricted the places where an observed variable may appear, and in the rules we never substitute a variable by an observed pointer.

Example
Before starting to explain the specific details of each language, for the sake of clarity, we present an example. Thus, we will consider a simple example in order to see the way the observations take place. Example 1. We are interested in observing a single integer number but with two observations. Thus, we consider a Haskell expression as follows: observe " obs2 " ( observe " obs1 " (10:: Int )):: Int This expression is converted by the normalization process to a new expression (e 0 ) in our core language: Next, we show how this expression is reduced using our semantics. We will concentrate on the local rules, so we will not show the not yet explained global rules needed to reduce it, such as the rules concerning the activation or the blocking of the threads. In this case, as we only need one heap and one file, we will show all the steps highlighting with • the active threads that will guide the reduction. The first step corresponds to the application of the letrec rule that replaces the variables with fresh pointers: p 1 for ten, p 2 for tenO, and p 3 for tenOO.
p main At this point, closure p 3 becomes active due to the application of the global rules.
Now, adding the line numbers to the observation file we get the following file: This file has more information than that required by Hood. In particular, the information follows the same order as the process of binding reduction. Moreover, the Enter mark is not necessary for Hood, but this mark can provide us information about which closure was under evaluation in case of unfinished computations. Thus, the information we obtain is large enough to provide all the information required by Hood, but also all the information required by GHood [54] to produce graphical animations. Note that GHood is a graphical tool that analyzes the file and represents every annotation, for example, indicating that the closure is under evaluation when it processes an Enter mark.
The binding corresponding to the mark obs2 (p 3 ) is the first binding under observation that is demanded. The next annotation appearing in the file means that the binding has been entered to reduce it. Before reducing this binding to normal form, p 2 (that is, the one annotated with obs1) has been demanded. Then, we enter to reduce this last binding, as its expression was already in normal form, an Enter annotation for the reference (2, 0) has not been generated and it has immediately reached its normal form (line 3). Now, we will show how Hood annotations can be obtained from this file. We have to generate a different observation tree for each Observe annotation in the file. To generate this tree, we start with the observations Observe str. Each Observe str is the root of an independent tree, whose annotation is given by str. Then, it is necessary to analyze sequentially the annotations of the file. Notice that we are only interested in the marks starting with Cons: There are two of them, as it can be seen in lines 3 and 4. The former points to line 2, meaning that the parent is in line 2, while the later points to line 0. Hence, two trees are obtained: obs1 10 obs2 10 Then, we only have to flatten the previous trees to produce the same output that Hood produces: --obs1 10 --obs2 10

GpH Formal Semantics
When we evaluate a GpH core expression, we will usually need to create several independent threads. However, we will only need one heap because GpH core does not have processes, communication, etc.
The semantics has been divided in two parts: The rules corresponding to the local behavior of the GpH operators ('seq' and 'par') can be seen in Figure 4, and the ones defining the global behavior of GpH are shown in Figure 5. First, let us comment the first set of rules. In case the left variable is not yet reduced to whnf , the sequentiality of the 'seq' operator forces its evaluation (rule seq). After obtaining the value, we proceed with the evaluation of the second variable (rule rm-seq). Regarding the 'par' operator, it creates a potential thread. If p points to a 'par' binding, we have to go on with the evaluation of its second parameter. Moreover, the binding corresponding to the first parameter becomes runnable (denoting that it can be a new thread), provided that it wasn't demanded yet (rule par). (parallel)

GpH Global Transitions
Let us start introducing some notation. The global rules define named transitions of the form −→ , where corresponds to the name of some rule. In addition, the meaning of =⇒ is that rule −→ has to be applied as many times as possible. Moreover, =⇒ • =⇒ represents the composition of rules and : First, the rule is applied and afterwards the rule . The local evolution of the system depends on the threads that are active in H, and we represent this set as H A : and |H A | is its cardinal. Following the same notation, the threads blocked on p are represented as H B p : Now, we will discuss the global transitions of the GpH semantics. First of all, let us remark that GpH has a single common heap for all threads. Thus, the introduction of the annotation file in the rules is quite straightforward, as this file will be shared by all threads under execution. Next, we comment on the rules dealing with the global behavior ( Figure 5):

Rule parallel
Each thread that is active can evolve independently (using local rules), and then we have to merge the corresponding heaps to obtain the global behavior. More precisely, we have to consider each active thread p i A → e i . Then, for each one, we consider two parts of the heap where H i U contains those bindings that are not modified by the local rules, while H i M contains those bindings that were modified by the evolution of the thread, and their final state after such evolution is represented by K i . The final heap contains those parts that were not modified by any thread ∩ n i=1 H i U , together with the result of the execution of the threads ∪ n i=1 K i . In order to consider this rule, it is necessary to prove that all the involved heaps are consistent, that is, there is no interference between the evolution of the active bindings. This proof can be found in [71].
We print the observations in sequence: We start with f 0 and then we continue printing the annotations coming from the evaluation of thread p 1 A → e 1 ; next, we go on with those corresponding to thread p 2 A → e 2 ; later, we continue with those of p 3 A → e 3 , and so on.
Let us remark that different evaluation orders among threads can result in different files. However, all of the possible files are consistent with the observations. In fact, we could even modify the rule to mimic the concurrent evaluation of the threads. The only restriction is that a semaphore has to be used to access the file, so that we can guarantee that the printing operations are atomic.

Rule deactivation
In this case, we do not transform the annotation file. The rule turns into runnable those bindings that were previously blocked on a pointer, provided that this pointer is now bound to a value. Moreover, the pointer has to turn into inactive mode.

Rule activation
As in the previous case, the rule does not transform the annotation file. There can be as many active threads as available processors, and this number is a global constant. Thus, it is not possible to always turn any runnable thread into an active thread, and it is necessary to define a priority criterion. In particular, GpH gives priority to those variables that are currently demanded by the main thread. More precisely, the preference criterion pre(p main , H) is the following: Global evolution comprises a scheduling phase that consists of deactivation and the activation of threads: Finally, the global evolution is defined as follows: First, all the rules concerning the parallel evolution are applied, and then the scheduling evolution takes place.

Example: Semantic Evaluation in GpH
In order to better understand the behavior of the observations in GpH, let us provide an example of the semantic evaluation of a GpH expression. In this example, we will concentrate on the reduction steps corresponding to the observations. For that, we will show the interaction between the observation marks and the parallel computation.

Example 2.
In this example, we will observe the Fibonacci function: We will evaluate a parallel version of this function with an observation mark. The parallelism will be achieved because the arguments of the function will be evaluated in parallel. Notice that the observations will also be produced in parallel, due to both recursive calls of the parfibO. We will present here only the rules corresponding to observations and parallel computations.
We will reduce the Fibonacci of 2 that will be enough to understand the parallel evolution. We will consider that we have two processors to produce the reduction. The initial GpH expression is the following: main = parfibO 2 parfibO = observe " parfib " parfib parfib :: Int -> Int parfib 0=1 parfib 1=1 parfib n = nf2 'par ' ( nf1 'seq ' ( nf1 + nf2 )) where nf1 = parfibO (n -1) nf2 = parfibO (n -2) After the normalization process, the corresponding expression in our language is e 0 : letrec one = 1 two = 2 parfib = \ n . case n of 0 -> one 1 -> one _ -> letrec n1 = -n one n2 = -n two nf1 = parfibO n1 nf2 = parfibO n2 sum = + nf1 nf2 sol = nf1 'seq ' sum in nf2 'par ' sol parfibO = parfib@parfib in parfibO two Notice that n corresponds to an unboxed integer. Let us recall that the integers are considered ordinary constructors of arity 0. To present the examples, we consider that case expressions have a default alternative "_" meaning that we are not interested in the value of n. The starting configuration is the following: At this point, there is only one active thread under execution. It will evolve until new threads are created.
After the next step (just below), we have reached the point where p 6 points to the 'par' expression. At this point, two threads are generated, corresponding with the reduction of the 'par' expression p 6 . The main thread becomes blocked and two new active threads are created (p 6 and p 10 ).
At this point, the parallel evolution of the two new threads starts. The evolution of the thread p 10 will produce the following configurations: At this moment, the parallel rule activates the threads p 12 and p 14 . This activation produces a demand on the bindings p 9 and p 13 ; these bindings become active after a new application of rule parallel. Now, two new observations are produced: One corresponding to a reduction applying the β-reduction@ rule to the binding p 9 , and another one corresponding to the application of the demand@ rule to the binding p 13 . These observations might have been produced in a different order, although one can easily check that the final observations would have been the same. These reductions correspond to the following configurations: After these reductions, the rule demand@ is applied to the binding p 9 . This rule produces another observation in the observation file, and it activates the bindings p 8 and p 16 . Next, the activation rule is applied to the bindings p 8 and p 16 . These steps are presented here: Now, the rule demand@ is applied in parallel to the bindings p 13 and p 15 . The evaluation of both threads in parallel will generate the following configurations: At this point, the evaluation of the binding p 10 finishes, but the computations must continue evaluating the only thread under execution: p 16 .
The computation ends at this point. Thus, the produced annotation file will contain the following observations: By analyzing this file, we can generate the following observation tree: The flattening of the tree will produce the following observation:

Eden Formal Semantics
When we evaluate an Eden core expression, it can be necessary to create many parallel processes, where each of them encompasses several independent threads. In particular, each thread will be in charge of evaluating one of the outputs that the process has to produce. The input values of the process are given by its parent at the moment of its creation, and this input data are shared by all the threads within the process. The threads trying to access to unevaluated inputs are temporarily blocked. Let us recall that the instantiation of processes is done explicitly, while any other aspect (like communication or synchronization) is carried out implicitly. Eden allows two kinds of communications: transmitting a value in a single piece, or using stream-based communication. First, we will only deal with single value communication; and, later, we will give the rules for stream-based communication. Figures 2 and 3 show the rules dealing with the local behavior in Eden. It is only important to highlight that we do not distinguish between active and runnable threads. That is, we assume that labels A and R are equal.
In contrast to the case of GpH, now there is not a single heap. An independent heap is used for each process, and they do not share bindings among them. Thus, it will be easier to use independent files to handle the observations of each process. Thus, a process is now denoted by id, H f , where id is the identifier of the process, H represents the heap, and f denotes the corresponding file of observations.

Eden Global Transitions
We will use the global transition rules to manage how the processes evolve in parallel. The basic aspect of a global transition is as follows: Notice that the heap of each process r (that is, H r ) can change into a new heap H r . Moreover, the transition can also create new processes. Furthermore, the observation file of each process ( f r ) can also be modified, adding new annotations and producing a new file f r .

Auxiliary Functions
A few auxiliary functions will be useful to simplify the definition of our global rules: Each process has its own heap, without sharing bindings with other processes. Thus, when a new process is created, we have to copy from the parent heap to the new heap all the bindings that will be necessary to evaluate the process body. Function nh (needed heap, see Figure 6) will be used to tackle this issue. In particular, nh(s, e, H) gathers those bindings of H that can be reached from e. Moreover, it also transforms the observation marks in all the closures adding them the string s. During the process of cloning bindings between heaps, care has to be taken when we find a lambda abstraction that is under observation (notice that they are the only whnf that can be under observation). In this case, it is important to record its original process. We will achieve this by modifying the observation associated with the binding. Function mo(s, p → e) (modify observations) deals with this problem. Notice that it only has to consider the definition for whnf forms. Nevertheless, there exist different options to define this modification: 1. As it appears in Figure 6. The observations are modified to remember where they come from, that is, which is the corresponding parent. 2. Not modifying the observations and simply copying them to the new heap (mo(id, p α → λ @obs x.e) = p α → λ @obs x.e). In this way, it is not possible to get the relations between the observations produced in the child and in the parent. Nevertheless, by post-processing the annotation files, it is possible to rebuild these relations. 3. A third option consists of removing the observation marks of all the bindings (mo(id, p α → λ @obs x.e) = p α → λx.e). In this way, the observations are only produced in the process that starts the evaluation of the observation mark. Thus, in case programmers are interested in observing some data in the child process, it is necessary that they explicitly introduce an observation mark in the body of the process.
Every time a process is created or communication between processes takes place, it is necessary to guarantee that any expression that is sent is in whnf . This is done recursively. That is, in case a pointer appears in an expression, then we have to follow such pointer and guarantee that the expression it is pointing at is also in whnf . Function needed first free (nff, see Figure 7) checks this condition, and it returns those reachable expressions that are not in whnf . Thus, if we look at the process creation rules, we see that we can only create process q with heap H provided that nff(q, H) = ∅. Analogously, we need nff(e, H) = ∅ to be able to send e in a heap H (value communication demand rule). Function nff will also appear in rule process creation demand.
In this case, it provides the bindings that are needed to perform the eager creation of the new process.

Global Rules
After introducing the necessary notation, we will comment on the rules concerning the global evolution of Eden. At this level, we need to tackle the following issues: process creation (Figure 8), inter-process communication (Figure 9), thread management (Figure 10), and the actual system evolution ( Figure 11). For each of them, several steps will be needed to obtain the required behavior, and they will typically require to handle two processes. In order to keep the rules simple, we do not show the observation files in those rules that never modify it. Moreover, let us remind that the rules can only transfer the file from the left to the right part of the rule.    Figure 11. Eden core: Parallelism.
Process creation. The creation of a new process takes place when an #-expression is evaluated. In this case, we have to apply a rule from Figure 8. In case the process is under observation, we use rule process creation@, while rule process creation is used otherwise.
Analogously to the case of the definition of nh function, there exist different options for the process creation rule: 1. To create the new process without taking into account if the process is an observed λ-abstraction or not. In this way, the rule process creation@ would not be necessary. It is only necessary to consider that the body of the process in process creation rule can be an observed λ-abstraction. This λ-abstraction will be modified by the function nh; so, the observations obtained will depend on the version of nh that we use. 2. To create the new process taking into account that the process can be an observed λ-abstraction. In this case, we will observe the input and output channels. The rule process creation@ indicates this behavior. As in the previous case, the λ-abstraction will be modified by the function nh; and the observations will depend on the choice of nh.
In any case, when we create a new process, we create a new output channel ch o and we have to block on it the parent thread that evaluated the #-expression. This channel corresponds to the initial thread in the new child process s and it will be used to communicate the final value from the child to the parent. Analogously, on the child side, a thread will be blocked on a new input channel ch i ; this channel is controlled by a new thread in the parent and it will be used for sending data from the parent to the child. As we have already mentioned, the creation of processes is not lazy. In fact, a process is instantiated as soon as we find in the heap a variable that points directly to a #-expression, no matter if the binding is currently active or not.
Both rules are equal in all aspects but one: The second rule needs to introduce observations in the data exchanged with the process. Let us remark that the annotations in process r are handled in the same way as the observation of a λ-abstraction.
We define the iteration of these rules as: Inter-processes communication. Figure 9 deals with the communication of values. Let us remind that in order to send a value we have to copy (from the heap of the sender to the heap of the receiver) the bindings of the variables that can be reached from such value. The situation is the same as when we were creating a new process, that is, we can only proceed with the rule in case the expressions to be copied are in whnf (nff(w, H r ) = ∅). We need to avoid name repetitions in the heap of the child process. Thus, we apply an η-rename to everything in nh(r, w, H r ). Then, the multi-step rule dealing with communication is defined as follows: =⇒ . Let us remark that the rule appearing in Figure 9 deals with the communication of a single value. Later, in Section 6.3, we will extend the semantics so that it can deal with stream communication. Thread management. Figure 10 contains the rules dealing with the management of the threads. The aim of each one is the following: WHNF unblocking: When a binding finally reaches its final value (it is a whnf ), the threads blocked on it are unblocked. WHNF deactivation: Another operation that must be performed is the deactivation of the bindings that have reached their final value. Blocking process creation: The creation of new processes must be blocked until their free dependencies are in whnf . Process creation demand: In order to be able to unblock the recently created processes, the evaluation of their needed free variables must be demanded.

Value communication demand:
In order to be able to communicate a value through a channel, it must be in whnf , so its evaluation must be demanded.
Finally, let us recall the previous rules activate the application of rules value and app-demand.
By combining and iterating rules from Figure 10, we obtain the following rule: It is trivial to prove that Unbl always finishes. That is, these rules can only be applied a finite number of times. The reason is that the amount of threads that are blocked cannot be infinite, and its number cannot be increased by using a deactivation or an unblock. System evolution rules.
The overall behavior of the system is governed by these rules. Each process evolves by using local transitions, and then the global system evolves. Figure 11 gathers this behavior: Rule parallel-r deals with the local evolution inside a process, while the overall evolution is done with parallel that handles all processes in a single rule.
The internal evolution of each process is done following the local rules shown in Section 4.1. Thus, the internal evolution of process r depends on the active threads in H r that can evolve. Notice that parallel-r is the equivalent to the rule parallel that was defined for GpH. The difference is that in Eden there is an independent heap for each process. Thus, the number of active threads belonging to process r is n r = |H A r |. After defining how each process evolves, rule parallel defines how the whole system S evolves.

Global System Evolution
Once we have defined how each process evolves, we need to tackle the evolution of the overall system. We use =⇒ to represent it. Its definition is as follows: =⇒ indicates that all pending communications must be performed, next we create new processes, and then we deal with thread management.

=⇒
Finally, a computation is an evolution of the global system: We consider that the computation finishes when the p main pointer is inactive. However, there are more options to consider the end of a computation such as there are not any active threads in the system.

Semantic Possibilities
As we have discussed previously, there are several options to define the process creation rule (two alternatives) and the definition of the function mo (three options). Since both features are independent, we have up to six possibilities to define the formal semantics with respect to the observations. The selection of any of these possibilities will affect to the observations we get in the annotation file. Thus, when analyzing the final results, we must be aware of the semantic option we are considering.
The table in Figure 12 summarizes these semantic possibilities. In this table, we want to remark that the richest option with respect to the observations is the one that chooses to observe the channels in the process creation rule and to remember the parent in the mo function. The fewer observations are obtained in the semantic option that chooses not to observe the channels in the process creation rule and to remove the observations in the mo function. The rest of the cases just lay in the middle of both alternatives.

Example: Semantic Evaluation in Eden
As in the case of the GpH semantics, now we present an example to illustrate the behavior of the semantics we have just defined. For that, we will show the interactions between the parallel computation and the reduction of the observation marks. It will also be interesting to analyze the different semantic options because this will clarify the advantages and disadvantages of each alternative.

Example 3.
We are going to consider the same example we used to illustrate the GpH semantics: The parallel computation of the Fibonacci function. In this way, we can also see the differences at the semantic level between Eden and GpH. Again, we are going to compute in parallel the recursive calls of the Fibonacci function.

parfibO = parfib@ { parfib } in parfibO two
Again, we will consider that case expressions have a default alternative _ meaning that the actual value of n is not relevant. Next, we will present the most important reductions from the initial configuration:        Up to this point, the computation has been similar to the one produced in GpH. Now, the parallel computation begins and the evolution differs from the one we have shown for GpH in Example 2. At this point, the semantic rules indicate that two processes must be created to compute the bindings p 9 and p 10 in parallel.
A new heap for each of the new processes is built. These heaps will be different depending on the semantic option we are considering. For each of the semantic options we have described, we will show the resulting annotation files.
1. Observing the channels and modifying the observation marks adding that they came from the other process (such as is presented in the semantic rules, Figure 8): This is the richest semantics in terms of observations in the annotation files. First, the rules indicate that two new processes (child 1 and child 2) must be created. Thus, two new heaps are built, each one contains the information needed by the process. We also create the communication channels. On the one hand, we have the bindings corresponding to the output parent channels (the variables ch 13 and ch 21 ), the corresponding bindings are blocked in the parent process and active in the offspring. On the other hand, we have input parent channels (ch 12 and ch 20 ) that are active in the parent process and blocked in the child processes.
More precisely, each process is the application of the Fibonacci function (parfibO) to the corresponding parameters. Thus, the first task that must be accomplished is to copy all the needed heap to compute this function into the child processes. Since it is an observed λ-abstraction, the rule process creation@ is applied. In the table below, the heaps of the three processes just after the application of this rule can be found. Then, all three of the processes evolve independently until the child processes are blocked because they need the argument of the Fibonacci function. This value has to be computed in the parent process. During this autonomous evolution, annotations are produced in the annotation files. The parent process has to evaluate the parameters needed by the child processes, and this evaluation generates the lines 6-9 in the parent process file. The child processes begin their execution and as result we obtain lines 0-2 in their respective files.
At this point, the child processes can no longer continue until they communicate with the parent process, so the value communication rule is applied. After that, the child processes continue and generate lines 3-4 in their respective annotation files. Now that the resulting values have just been computed in the child processes, they can be sent back to the parent process by using again the rule value communication. In addition, finally the parent process makes the annotation in lines 10-12 in its file. In the table below, we show all the annotations we have described: Analyzing each file independently, we can obtain the following observation trees:

Observing the channels and not modifying the observations:
In this case, the differences appear when the rule process creation@ is applied because now it is only necessary to copy the needed heap from the parent process (applying an η renaming). The main difference with respect to the previous case is that the bindings p 15 and p 23 are not created, and the bindings p 14 and p 22 will be bound to the expression λ @[(0,0)] n. case n of alts. Then, proceeding like in the previous case, we get that the final observations appearing in the files are the following: Thus, we obtain the following observation trees: In addition, they produce the following observations: As we can observe, in this case, the annotations of the child processes that refer to the parent process are missing.

Observing the channels and removing the observations when the bindings are copied into another process:
Now, the differences with respect to the previous case is that p 14 and p 22 will be bound to a non-observed expression λn. case n of alts. Thus, we do not get any observations in the corresponding files of the child processes. Therefore, the final annotation files will be the following:  In addition, the corresponding observations are as follows:

Not observing the channels and modifying the observation marks adding the process they come from:
In this case (and in the following ones), when the child processes are created, the rule process creation will be used, even if the function that is going to be executed in the child process is under observation. The difference with respect to case 1 is that the channels are under observation (as indicated in rule process creation). Thus, the bindings of pointers p 9 and p 10 are bounded directly with its corresponding channels ch 13 and ch 21 , and the channels ch 12 and ch 20 are directly bounded to p 7 and p 8 (without any observation marks). Then, proceeding as in the previous cases, the resulting annotation files are: Whose tree representation is: This case is similar to case 1, the difference being that the data transmitted through the channels are not observed. In every process, the λ-abstraction applications that have been reduced locally are only obseved.

Not observing the channels and not modifying the observations:
In this case, bindings p 15 and p 23 are created, and bindings p 14 and p 29 will be bound to the expression λ @[(0,0)] n. case n of alts. Thus, the final observation files produced in the global computation will be the following: Again, analyzing each file we get the following observation trees: In addition, the produced observations are as follows: The main difference with respect to the previous case is that the references to the parent process are missing in the child process annotations.
6. Not observing the channels and removing the observations when the bindings are copied into the child process: In this case, bindings p 14 and p 22 will be directly bound to the unobserved expression λn. case n of alts. Therefore, the child processes will not produce any kind of observations. This implies that the observation files after the computation will be the following: In addition, the corresponding observations are as follows: As it can be seen in the previous example, the most complete option is the one that uses the rule process creation@ and the complete mo function. Unfortunately, this option requires introducing changes in the Haskell compiler to implement it that would break the modularity of Hood (it is one of its most interesting features). Thus, in our current implementation (see [61]), we have decided to use rule process creation@ but with the intermediate option for function mo. That is, in order to keep our implementation simple, we are not recording all the possible information, but we are quite close to obtaining all the information. In fact, by using adequate annotations and by postprocessing the results, we can obtain nearly the same information.

Eden Streams
In the rules we have introduced so far, processes communicate only a single value through any channel. Of course, this single value could be a list, but in order to send this list it should be in normal form, which would prevent sending many of the interesting structured data that can be defined in a lazy language. In order to allow for communicating more than a value through the channel, Eden has streams, a structure similar to lists that allows for sending a list of values through a channel, sending them one at a time, and as soon as they are available. In order to handle streams, we need the rules in  -stream communication demand). Let us remark that the observation file is not modified by any of these rules.
We explain them briefly:

stream-demand.
If a channel deals with streams, and we have not yet evaluated the head of the stream, we demand it. The reason is that Eden streams are evaluated eagerly. empty-stream communication.
If the stream is finished (that is, it is nil), we send such value and then we close the stream. Thus, it will not be possible to perform any other communication using the channel. head-stream communication.
When the head of a stream is available to be sent, the receiver gets a fresh variable with the corresponding value. This communication is similar to the one of the value communication rule.

head-stream communication demand.
If the head of the stream is evaluated to whnf , then we have to demand its needed first free bindings.
Let us remark that any communication can allow new communications to take place. Hence, we have to repeat the applications of rules value communication, empty-stream communication, and head-stream communication until no further communication is possible. Thus, we define

and then
Com =⇒ as usual. It is also necessary to modify the Unbl =⇒ rule by adding the rule head-stream communication demand: Figure 13. Eden core: stream rules.

Stream Communication in Eden via an Example
Next, we will present an example of communication using Eden streams. In this example, we will also present a problem that will be studied in more detail in Section 8: Speculative computations. In order to speed up the computation, Eden communications are performed eagerly, that is, the data to be communicated must be computed even if there is not demand for those data. This implies that some unneeded computations could be performed.

Example 4.
In this example, the parent process creates a child process that generates an infinite list. The parent process will only use the second element of this list, so the child process may compute many non-demanded values. The concrete expression we are going to show is the following: At this moment, due to the rule process creation@, the reduction of p 6 needs to create the child process. After that, the computation reaches the following configuration: The computation continues. Note that the child and the parent are not synchronized. Thus, while the parent is making its computation, the child could be able to produce data even if they are not demanded by the parent, that is, speculative work can be done. In this example, we will suppose that the child process generates two extra data before the computation ends. The global computation ends when the parent gets all the data it needs and it finishes its own computation; then, it sends a kill signal to the child process. main child 1 0 Cons 0 7 3 1 1 Cons 2 : 4 3 1 Cons 2 : By analyzing each file, we obtain the following observation trees: Analyzing these observations, we can extract the following conclusions: • The parent did not need the first value sent from the child to compound the final result.

•
The child has produced two extra numbers, 9 and 10 that the parent did not need to get its final result.
In general, the programmers can have difficulty with finding by their own this unneeded computations. However, once a tool points out the situation, it is not difficult to modify the program to overcome this problem. In Section 8, we illustrate how we can easily obtain information about unneeded work in Eden.

Correctness and Equivalences
One key aspect of Hood is that the observations should not modify the meaning of any expression. Thus, any time we evaluate an annotated expression, the result should be equal to the result obtained when we evaluate the equivalent expression without annotations. In addition to that, it is necessary to prove that the expressions that are demanded now are exactly the same as in the evaluation of the original non-observed expression.
Obviously, observation marks are the main differences. Hence, we define a function that removes them, so that we will be able to compare expressions with and without observations. The transformation removing the observations from any GpH-Eden core's expression is defined next. Definition 1. Function R : GpH-Eden core −→ GpH-Eden core removes observations. Its definition is done recursively, and the only case that is not straightforward appears when we find an observed expression: We can trivially extend the previous definition to remove all the observations of a heap, since we only have to apply R to all the expressions appearing in the Heap. Thus, R H means {p → R e | (p → e) ∈ H}.
The theorem we want to prove states that, if we evaluate an expression with observation marks, the final result is the same as if we remove the observation marks from the expression. The first problem we have to address is defining what we understand by the final result. Since we are working with a lazy language, it is not enough to consider the final value of the expression, we have to guarantee that we have not introduced eagerness into the evaluation. Thus, we need a stronger result that the final heap in both cases is the same. Formally, we would like to prove the following property For all e ∈ GpH core : Unfortunately, this property does not hold. The problem is that in the rules with observations we introduce auxiliary pointers to deal with observations, for instance rule fi-reduction@ adds two new pointers to the heap (t and l ); these new pointers do not have an exact equivalent in the corresponding rule without observations: fi-reduction. These new pointers point to the original ones, but they have marks to remind that we are observing them. Thus, instead of proving that they are equal, we will prove that both expressions are essentially the same. We would like to define a simulation relation between heaps H H such that ∀(p α → e) ∈ H the equivalent binding (p α → e ) ∈ H has the same value as e in H, that is, if we follow all the pointers, we will finally find the same expressions. To establish the simulation relation between the heaps, it is necessary to take into account the state of the threads. That is, when we are evaluating an expression without observations, if the corresponding thread is active and the reduction of an equivalent expression with observation marks has intermediate pointers, the state of one of these pointers will be active; moreover, the previous intermediate pointers will be blocked as they will be waiting the result of the thread, and the posterior intermediate threads will be inactive. When the state of the binding without observations is inactive or blocked, then the intermediate pointers in the equivalent expression with observation marks will be inactive or blocked, respectively. Finally, when the state of the binding without observations is runnable, then the first or the last intermediate pointers in the equivalent expression with observation marks will be runnable. If it is the first, then the rest will be inactive and if it is the last then the rest will be blocked waiting the final result. The following definition establishes that simulation relation, where rp e corresponds to the obvious function that removes pointers from the original expression, so that expressions can be comparable.

Definition 2.
Let H, H be two heaps. We say that: , q n α n → e } where η p = q 1 and the following conditions holds: rp e R rp e and one of the following conditions holds: if β = R then one of the following conditions holds: First, we present a proof about the equivalence in GpH of the evaluation of observed and unobserved expressions. Then, we will deal with the analogous proof for the case of Eden. Let us remind that all GpH threads share a common heap.

Theorem 1.
For any e ∈ GpH core we have: We can prove it taking into account the ideas presented in [72]. Thus, first we consider a generalization of Theorem 1, as shown in the following proposition.

Proposition 1.
Let H be one heap without observation marks, H * one heap such that H R H * , and f a file, then: All the technical details of the proof can be seen in the Appendix A. Next, we present the main scheme of the proof. We will prove it by induction on the derivation. First, it is necessary to prove that the local rules with observations produce equivalent configurations to the local rules without observations. To prove this, in some cases, it is needed to apply more than one rule in the evaluation of an expression with observation marks, while it will be needed to apply only one rule in the equivalent heap without observation marks. Regarding local transitions, the only difference is that in GpH we can use 'seq' and 'par', but observations do not affect them. Thus, local rules preserve the equivalence.
The proof has to pay special attention to rule parallel because the annotation file is not modified by rules activation and deactivation. The difficulty with rule parallel comes when we have to perform ∪ n i=1 K i . In this case, we have to take care of possible conflicts between observed pointers. In fact, we prove the absence of interferences in the case of GpH. That is, in case there are bindings in heaps K j and K k for the same pointer p, then the value is equal. That is, if (p α → e j ) ∈ K j and (p β → e k ) ∈ K k , then α = β and e j = e k . Regarding observations, one heap could have two bindings (p α → q @(n 1 ,n 2 ) ) ∈ K j and (p α → q @(n 1 ,n 2 ) ) ∈ K k . That means (p Hence, p A → q @str would be involved in rule parallel. For instance, we assume it would have index i 0 . It is an active binding, so that other parts of the rule cannot modify it, so that (p A → q @str ) ∈ H l for all l = i 0 . Thus, necessarily, we have j = k = i 0 , and then n 1 = n 1 and n 2 = n 2 .
Theorem 1 is a particular case of the proved proposition. Thus, the theorem holds as a corollary of it.
Next, we prove that observations neither create new processes nor modify the behavior in Eden.

Theorem 2.
For any e ∈ Eden core , we have: Following the same approach as that in the previous case, we start proving a generalization of Theorem 2 and as a corollary of it we will have that Theorem 2 holds.
The main difference with respect to the previous case is that now we need to take into account that we have an independent heap for each process. Therefore, to prove that a system is equivalent to another system, it is necessary to prove that all the heaps are equivalent.
Proof. As we did with the previous proposition, we present the basic ideas of the proof, while all the details can be found in the Appendix A. We prove it following three steps: 1. We deal with local rules, proving that they do not modify the equivalence. This proof is trivial, since we have already proven the same property for the case of GpH. Notice that in Eden the rule parallel − r is analogous to the rule parallel of GpH, and GpH local rules are a superset of those of Eden.
2. We need to prove that function nh produces equivalent heaps if we apply it to equivalent heaps. Thus, we get that the rules related with the data transmission, (value communication), (empty-stream communication) and (head-stream communication) maintain the equivalence.
3. The proof analyzes if transition sys =⇒ preserves equivalence. There is a single rule changing its behavior under observations: Rule (process creation@) creates a heap that is equivalent with respect to (process creation). Thus, the transition sys =⇒ preserves the equivalence.

Analyzing Speculative Work in Eden
Eden creates new processes eagerly. Moreover, once a process has been created, it starts its computation even if its output has not been yet demanded by its parent process. This eagerness was introduced in the language to facilitate the efficient management of parallelism, but it can also produce inconveniences. In particular, a program could evaluate expressions that are not needed at all for the final output. These unneeded computations are considered speculative work. Fortunately, our observations can be used to analyze if speculative work has been done in Eden.
Let us remind that processes are similar to functions. Thus, we can observe process abstractions. We can obtain the number of speculative data computed by calculating the difference between the data transmitted as output and the part of that data that was actually demanded by the corresponding receiver.
Notice that we are interested in observing inputs and outputs of a process, but we can obtain such information by introducing a single observation in the lambda that describes the behavior of the process abstraction. For any expression x#y, we can substitute it by: letrec xO = x @processObserved in xO#y By doing so, the process abstraction is observed. However, when a lambda is observed, we observe both its inputs and its outputs. Thus, we obtain all the information we need. Figure 14 schematizes the information that we are observing. Obviously, what the invoker sends to the instantiated process is exactly what the instantiated one receives. However, receiving a datum does not imply that the datum was actually needed (used) by the receiver. Thus, we distinguish between what the invoker sends (outsToProcess) and what the instantiated one uses (insToProcess); analogously, (outsFromProcess) includes all the data sent by the instantiated process, while (insFromProcess) only contains the part that was actually used. Notice that the annotations corresponding to outsToProcess and insFromProcess are gathered in the file of the invoker: In contrast, the file corresponding to the instantiated process will contain the information of insToProcess and outsFromProcess: --Parent nameParent @(pos1, pos2) { \ insToProcess->outsFromProcess } By using the previous data, we can easily compute the amount of speculative data. First, the difference outsFromProcess − insFromProcess represents the useless data computed by the instantiated process. Second, outsToProcess − insToProcess represents the useless speculative data computed by the invoker process.
Let us show an example: letrec output=processObs # initialNumbers initialNumbers= [3,7,5,1,8,2,9.4] processObs=process@"annotation" process=\x. map sqr (takeWhile (<=7) x) in head output + last output Since the only values we are demanding are the first one and the last one of the output list, in the invoker process, we will observe In contrast, the annotations in the instantiated process has to record that the observation was originated in process r. Moreover, it will record the parent of the annotation (for example, (8,  As it can be seen, the instantiated process has computed two values that were not used (49,25), and it has received three values (2,9,4) that were never used. After post-processing the information to provide a more readable display to the user, we will obtain: To conclude this section, we want to remark that we have an actual implementation of a parallel extension of Hood. By using this extension, we have analyzed the speculative work of larger examples (see [61]). In particular, we have shown how to improve the efficiency of a parallel linear solver by using our speculation analysis.

Discussion
Parallel programming introduces additional difficulties to the debugging process. In addition to the problems we can find in the sequential case, parallelism introduces races among processes, synchronization problems, etc. The inherent complexity of these issues justifies the significance of developing a formal framework to analyze how to interpret the output produced by a debugger of a parallel language.
Our main motivation when we started our work was to provide a framework to allow debugging programs written in Eden, the parallel functional language we have been working on since the first years of its creation. Thus, we started analyzing tools available for its sequential part (that is, Haskell). In this sense, it is important to remark that the debugging process in Haskell is different from the classic imperative debugging techniques. Fortunately, Haskell type system allows for catching many potential errors at compilation time, and the clear separation of responsibilities among functions (due to the absence of side-effects) simplifies isolating the source of an error. However, laziness and the absence of state makes it difficult to apply classical simple (and useful) imperative techniques to debug programs.
The main reasons to select Hood as the core of our proposal are its simplicity and its versatility. First, its implementation is done as an independent library, so that it can be easily ported to any Haskell compiler. In particular, this fact simplifies porting it to parallel versions of Haskell, as we have already done. Second, its external aspect is similar to that used in classical imperative languages, so that it can be easier for programmers coming to the parallel functional paradigm with just an imperative background. In addition, third, Hood handles laziness very nicely, so that it is possible to analyze what has been actually demanded. This is the key characteristic to allow introducing special analyses, like the one presented to analyze the amount of speculative work done in Eden.
In [61], we have already presented pHood, our extension of Hood to deal with parallel versions of Haskell. Thus, the main contribution of our current paper is to provide a semantic framework to reason about our pHood tool, and to explore different alternatives that could be implemented. For instance, in this paper, we have studied six different semantic options in the case of Eden programs (see Sections 6.1.4 and 6.2), analyzing the amount (and kind) of information that could be obtained in each case. The implementation presented in [61] corresponds with the use of rule process creation@ but with the intermediate option for function mo. Our semantic framework suggests that we could improve the results obtained in our implementation by extending the definition of function mo, at the cost of needing to modify the compiler (instead of providing a simple independent debugging library).
A theoretical implication that can be inferred from our formalization is that it is possible (and even desirable) to define a common framework to deal with the debugging process of different parallel languages. In fact, our semantic framework can also be the basis to include other debugging techniques for GpH and Eden. However, our approach is limited to deal with lazy functional languages. Moreover, in case we would like to include other debuggers (e.g., Hat, Freja, etc.) for GpH and Eden, we should have to duplicate many semantic rules to adapt them to the peculiarities of each of the debuggers.

Conclusions and Future Work
We have introduced semantic rules to handle the introduction of observations that can be used to debug GpH and Eden programs. Indeed, our semantics corresponds with the implementation we have already presented in [61], where Hood was extended to deal with GpH and Eden. Both the formal framework and our implementation can be easily adapted to deal with other parallel extensions of Haskell. Thus, the work presented here could be used as the basis to develop a general debugging framework for parallel lazy functional languages.
We have also shown how to use our approach to analyze the amount of unneeded speculative work done in Eden programs. Notice that different kinds of analyses could also be developed for different parallel languages, depending on the concrete characteristics of them. For instance, in parallel languages without shared heaps, it could be possible to use observations to detect closures that have been evaluated twice. That is, Hood-like observations could detect the amount of duplicated work done in the programs.
As future work, we want to deal with several issues. First, following the work done in [73], we want to obtain an abstract machine from our semantics. Second, we want to develop new types of language-specific analysis based on the use of observations. Third, we are interested in analyzing how to debug errors appearing when Haskell uses libraries written in other programming language. In addition, fourth, we want to improve the visualization facilities of our implementation.
Proof of Proposition 1. The proof shows that the local rules with observations produce configurations equivalent to those produced by the local rules without observations, and that global rules maintain the equivalence between heaps. In some cases, it is necessary to apply more than one rule in the evaluation of an expression with observation marks, while it is necessary to apply only one rule in the equivalent heap without observation marks.
This proof is similar to the one that we made in the sequential case [62] because in GpH all the threads share the global heap. As in the sequential version proof, in this proof, we will not consider the observation file because this file does not produce any interference with the computation. The main differences with respect to the sequential case are: 1. The bindings in the heap are considered threads, so they are marked with their state (active, runnable, inactive, or blocked). The definition of the equivalence between heaps has changed to reflect this. 2. Instead of having a control expression, we have a binding in the control. 3. The evaluation steps are short, that is, close to the abstract machine steps.
Before continuing, let us introduce some notation that will be used during the proof. Let us consider the original heap without observations H 0 and the equivalent one with observation H * (note that it is not H * 0 that will be introduced later), so that H 0 R H * . Local rules deal with active threads; thus, we need to reduce one of the active threads in the heap H 0 , let us name it as p A → e, Let us consider the expression e * the equivalent one but with observations. Then, there exists n such that • rp e R rp e * , • and ∃k ∈ {1 . . . n}, α k = A, ∀j ∈ {1 . . . k − 1}, α j = B, ∀j ∈ {k + 1 . . . n}, α j = I

Proof of the first implication: left-to-right
First, we are going to prove that we can assume that the pointer to the expression e * is active. If it is not the case, we can make the heap evolve to an equivalent one in which it is true, which we will call H * 0 . This heap will be used along the rest of the proof. Thus, let assume that the pointer to e * is not active and let us analyze how it evolves into a heap in which it is active. To make the proof simpler, let assume that n = 2, if n > 2 it is enough to iterate the steps below. The proof will be done by analyzing the possible observation marks that the q * 1 binding can have:

Hypothesis A1 (H0).
H 0 R H * then either: There is a heap H * 0 such that the binding to the expression e * is active.
Proof. We proceed as follows: P0.1 By rules (demand) and (activation) because rule (demand) has deactivated a binding: Therefore, the initial pointers can be removed applying sometimes the corresponding P0 case. We will consider the heap H * 0 as the result of applying P0 as many times as needed to activate the last binding (the equivalent one). We need to take into account that this heap maintains the equivalence with respect to the original; H 0 R H * 0 .

(demand)
Hypothesis A2 (H1,H2). We have: By rule (demand): The proof will be done by cases over the possible observation marks that the equivalent binding can have.
where e * is blocked in p 0 .
As a particular case, we will prove this for the rule (demand@) and (activation).
The other set of rules follows the same simple scheme and the proof is made exactly in the same way. For this reason, we will not present them here.

P1
By rule (demand@) applies to q k : By rule (activation) because rule (demand@) deactivate a binding: By H0, H1, P2 and Definition 2: H 0 R K * √ At this point, we have obtained a heap K * equivalent to the original one that has one less inactive pointer, so we can apply the induction hypotheses to it. Thus, there exists a heap K such that H 0 =⇒ K and K * R K.
Proof of Proposition 2. In this case, the proof is made in three steps: 1. First, we need to prove that, when we apply local rules to equivalent heaps, we get equivalent heaps, that is, analyze the independent evolution in each process. Most of this proof is the same as the one we made for Proposition 1. The only differences are: • Eden uses streams while this concept does not exist in GpH.

•
The only difference with respect to GpH is that the state of a thread in Eden can not be runnable (the equivalent in Eden is active). Therefore, the rules (activation) and (deactivation) in Eden are simpler than in GpH and correspond to the rules (whnf unblocking) and (whnf deactivation), respectively.

P3
By P2: x.e * P5 By rule (process creation@): By nh produces equivalent heaps: nh(id, q, H) R nh(id * , q * , H * ) P7 By P6, fresh(ch o , l ), fresh(ch * o , l * ) and Definition 2 (the extra pointers are "intermediate" K R K * √ (the id and id * heaps maintain the equivalence) Summarizing, we have proved that the evaluation of an expression with observation marks produces equivalent configurations. That is, in parallel computation, the activity of two threads before the synchronization maintains the equivalence computations. On the other hand, the value communication (synchronization in Eden) also maintains the equivalence. As the communication forces the transmitted data to be in whnf , the data transmitted in both cases are the same. Finally, we have shown that the process creation produces equivalent heaps independently of the rule applied (process creation) or (process creation@). The conclusion is that Proposition 2 is correct.