Comparing Performance of Biomass Gasifier Stoves: Influence of a Multi-Context Approach

Millions of people worldwide die prematurely or suffer from severe health ailments due to cooking equipment that causes unhealthy doses of (household) air pollution. Many attempts to address this have fallen short because technology was not improved sufficiently or the way it was introduced constituted an ill fit with the broader “cooking eco-system”. In terms of technology, (biomass) gasifier stoves look promising on all three sustainability dimensions (people, planet, profit) but have not been adopted on a substantial scale across cultures and regions either. We therefore used a design approach that takes multiple contexts (target groups) into account and compared the performance of a gasifier stove that was developed following this multi-context approach with four previous gasifier versions. With the comparative assessment using criteria well beyond mere technological performance we found that it performed better than these versions as well as than what could be expected based on historical learning, while providing additional systemic advantages. These results encourage verification of the value of the multi-context approach in more settings while providing clues for refinement of the assessment method.


What is the Sustainability Problem We Want to Address?
When we think of large-scale sustainability issues that cause many casualties, cooking may not immediately be considered.Yet if we look at a few relevant numbers the need for cleaner cooking is obvious: almost three billion people, largely in developing economies, are subject to "dirty" cooking methods, resulting in over four million premature deaths [1].Household Air Pollution (HAP), of which cooking is a major source, results in 110 million DALYs (disability adjusted life years) due to a range of health deficiencies ranging from lower respiratory infections and cataracts to health issues like asthma and tuberculosis [2].
While there is no lack of initiatives with regards to Improved Cook Stoves (ICSs), one third of households within the ICS-target audience use a basic version with limited benefits for health and environment [2].Most households still primarily use the simplest and most abundant option: an open three-stone fire [3].Progress in adoption of cleaner cook stoves has been slow.Lately the pace has been picked up, but there is still much space for cleaner cook stoves to be adopted on a scale that this issue requires.While cooking is not the only source of HAP-e.g., see [4,5]-it is already such a broad issue that it suffices for the scope of this paper.As a consequence, we will not at length discuss the relation between cooking and other household energy applications, nor position ourselves in the broader field of household energy interventions.Expanding to this entire domain would distract from the focus that the topic of clean cooking deserves.
In this paper, we do discuss how an alternative development (or alternatively, design) approach might be tried to benefit clean cooking and we present a case in which we have put our suggestions to the test.This line of investigation is relevant since it could open doors to the use of new stove development methods that are currently not preferred because they seem "too complex"; we will discuss this latter aspect at length.
In summary, we can reveal that the results are positive in the sense that in a comparative test, using a wide range of criteria well beyond technical performance, the result of the suggested development method exceeds historical results even when compared to expectations based on "autonomous learning".We were able to use a valuable data set representing five years of biomass gasifier stove development and put great effort into ensuring that the comparison between historical and new results could be made fairly.The results warrant continued use to verify the value of this approach.We do suggest some points of attention, mainly for academics, for assessing and comparing results and for further refinement of the assessment method for gasifier stoves in particular.Firstly, we emphasize that other researchers do not have to use our assessment method, as long as they assess all objects (in this case stoves) in an internally consistent way, and with a sufficiently wide range of criteria.Secondly, the results of our assessment rely on a unique (historical) data set.Results are not intended to serve as a generic benchmark for gasifier stoves, but as an example of how one can develop an internally consistent assessment method that can be used to perform a fair comparative assessment by making "old data" suitable for such comparison.

Structure of the Paper
In Section 2 we first share more information that serves as backdrop to explain the relevance of this paper and therein focus on two main topics, only mentioning but not discussing the broader landscape of household energy interventions.In Section 3 we describe the setting of the specific case for this paper, a clean cooking project in Vietnam.We pay ample attention to the method that we used to assess and compare a number of previously developed stoves and a new one.The presence of the historical assessment data was the result of a multi-year effort and required a few steps to allow for fair comparison with new data.Section 4 presents our findings, which are discussed and interpreted in Section 5.The final section consequently contains conclusions with regards to the development approach that we used and implications for the relevance of the results.This is followed by recommendations for the benefit of the principal (case-company) and others who are interested in addressing complexity in design challenges, for clean cooking and in general.Throughout the paper, we pay attention to the link with sustainability.

Literature Review
This paper contains and combines two main components: (1) practical experiences, developments and literature with regards to clean(er) cooking solutions; and (2) current experiences with approaches to bring initially successful technologies to scale, in particular in the context of emerging economies.As stated before, we will only lightly touch upon but not discuss in depth the implications for the broader household energy landscape.The two components that we do focus on create a large enough scope as it is.We discuss these two components in a logically intertwined sequence: the current state of clean cooking, among others reflected upon through the lens of a range of sustainability aspects (Section 2.1) and a relevant development in that respect (Section 2.3), intertwined with current experiences with (cooking) scaling strategies (Section 2.2) and a new approach in that respect (Section 2.4).This together provides the academic grounding that sets the scene for the steps that we describe next, in the second half of the paper.The discussion on scaling strategies and barriers (Section 2.2) is a relevant basis to explore whether the new approach (Section 2.4) could alleviate these barriers.

Current State of Cooking and Relation to Sustainability
The introduction of Improved Cook Stoves (ICSs) has become influential to ensure sustainable energy access for all [4][5][6].Worldwide, initiatives regarding cleaner cook stoves are numerous and such initiatives have been going on for several decades [4,7].The reasons to promote cleaner cooking are not difficult to guess.Besides the aforementioned health-related effects, see e.g., [1,8,9], socio-economic consequences of unsustainable cooking methods include time spent on collecting (fire) wood, most relevant in rural areas, or high costs for purchasing fuels, depending on circumstances potentially relevant in urban and rural areas.On top of that the direct ecological effects are substantial: biomass-based cooking methods cause 3% of global CO 2 emissions, 25% of global black carbon emissions and over 1.3 billion tons of wood fuel consumption, contributing to deforestation [2].The effects on the sustainability-spheres often are interconnected.For example, abundance of free firewood or cheap charcoal keeps people with little disposable income in rural areas locked into old cooking methods: economic arguments (free availability and low cost stoves) join forces with social arguments (habits, ease of use, familiarity) creating serious social effects (health) and ecological degradation which is exacerbated by social-economic lock-in (livelihood dependence).This also makes it difficult to pinpoint exactly which intervention (technology-focused or broader) has which effect exactly [10].In addition, while the exact levels of different presumed benefits are not certain and vary per evaluation, the benefits are there [5], just not always assessable with the level of precision that policy makers hope for [5,10].
For now it seems to be justified to conclude that a long period of initiatives has not resulted in real breakthroughs on a meaningful scale relative to the size of the population that suffers from the effects [9].In urban areas, cleaner fuels like LPG suffer from a lack of infrastructure and in rural areas freely available biomass and inefficient but very familiar cook stoves are difficult to compete with.One might say that on a micro scale (developing better cook stoves) the problem is relatively simple, but when considering the cooking eco-system (affordability, supply chains, alternatives, habits, time vs. money), many of these elements are interdependent, and as such, the system as a whole is complex.
Decades of clean cooking projects now slowly start to pay off.The cumulative achievement of all members of the Global Alliance for Clean Cook stoves surmounts to 82 million cook stoves distributed, of which 53 million are labeled as "clean and/or efficient" [7] but only 23 million both clean and efficient.While these numbers look substantial, on a total of three billion people who are affected, there still is much ground to cover.This can be shown by a quick calculation: three billion people, in families of for example 5 people on average would require 600 million clean and efficient cook stoves.23 million implies less than 4% of that target has been achieved.This number is even too high because of two reasons.Firstly, as the GACC itself acknowledges, "distributed", often funded by donors, is not the same as accepted and adopted by the actual end users.Whether that will happen is still a question mark.Secondly, the threshold to be labeled as "clean and efficient" is set at tier-2 for efficiency and tier-3 for emissions, on a scale from 0 to 4 with 4 representing the best performance [11].It is doubtful for many technical testing protocols whether they lead to relevant results, since they are based more on laboratory tests than real life situations, which means they are likely to overestimate the positive health effects, e.g., [7,10].This implies that there is still much room for improvement in the realm of cleaner cooking.
The slow progress in achieving impact on a substantial scale is in part due to the underestimation of the value that for many people is hidden in the current system [3], like entrenched jobs in current value chains, or underestimation of the relevance of human behavior as opposed to technical features [5].Indeed, many studies found similar reasons like user-insensitive design [12], focusing on technical functionality but discounting socio-cultural fits [13], focus on just technology [14], technological efficiency without much consideration for affordability for end-users after a donor-funded phase [15] or a focus on getting "anything" out there to capture market share and then expand quickly [16].In the end, the effectiveness of interventions needs to be assessed based on the combination of the incidence of technology adoption (extensive margin), and the way the technology is actually used (intensive margin) [17].
While current cooking technologies and their value chains have detrimental effects in terms of deforestation and health, they are in fact sometimes efficiently organized and many people depend on them for their livelihood, as such just making people aware of cleaner alternatives is certainly not enough.Many researchers, e.g., [18][19][20], found that simply stating the beneficial effects on health and environment compared with for example (char)coal-powered cooking is not sufficient for the uptake of these new technologies.This conclusion leaves aside the magnitude of these effects and under which conditions these occur in real life [10].The experiences so far do seem to present a case for inclusion of more socially and eco-system related aspects to consider, even if these would not be as easily measurable as some technical aspects.
To summarize this section: dirty cooking is a large-scale issue affecting billions of people.Many initiatives so far focused on one technology and considered only one context at a time.This resulted in a specific product for a specific target group in some cases followed by incremental improvements.This has not been helpful in creating change on a meaningful scale across cultures, regions and segments, i.e., contexts.This development may have been exacerbated by the focus of stove developers and donors on seemingly objective metrics, which however now seem to have created a false sense of certainty [7].To understand this issue better we now look closer at current approaches to scale and challenges that have been encountered in doing so.

Current Approaches to Scale Proven Technologies: Gaps to Address
Approaches that aim to reduce the negative social, environmental and economic effects of cooking need to take into account contextual specifics [21].To create substantial change, we however also need to ask how such a contextual focus can be combined with an outlook of scalability.After all, with an issue affecting as many and widely dispersed people as cooking does, we will need to serve people in different contexts (e.g., urban/rural, income segments, countries).Therefore, contextually optimized solutions will not be sufficient.The continuing cycle of redesign that so far is required to achieve meaningful scale across multiple segments is undesirable for an issue with suffering on this scale [22].It seems necessary to recognize that maximization of relevance on micro scale, i.e., a specific context, is counterproductive to the desired level of uptake [23,24] even if the former is the approach that is currently most commonly used in practice.
The explanation for this context-by-context approach is not difficult.Historically, scaling up the sales of products consisted of little more than producing more and expanding market outreach.In case of products for emerging markets, this often resulted in stripped versions of the offerings for more developed markets [25].Such expansion strategies do however not satisfy the needs in these new segments, which is more problematic if serious social problems need to be addressed.Strong dependency on one initial product points at downplaying the differences in actual needs as well as abilities of people [26,27].At first sight therefore, it seems justified to avoid "universal" solutions and use approaches that build on context-specific intelligence [28].
However, the case of for example ICSs clearly shows the downside of this context-specific strategy.In many cases, initiatives focus mostly on rural areas, with solution directions not adequately aligned with urban end-users [29], or vice versa [30].The initial focus on either rural or urban segments puts the characteristics of one of these center stage.With different starting points regarding time, availability of fuel and perception of costs, a solution design process would be pushed in a very different direction.When one context is the leading one and the context-specific solution is then in next phases adjusted to next contexts, the later ones may suffer from the initially chosen path or solutions need to be redesigned to a large extent, thereby reducing the chance and/or severely speeding down economies of scale and thus affordability advantages.Besides, possible connections between the contexts are not made, or only much later than necessary.This neglect of possible connections occurs even more often if the different "contexts" are in fact different countries.Such path dependency [31] is a typical phenomenon when complex, interconnected large scale issues have been broken down into seemingly more manageable chunks, e.g., markets that are subsequently entered.As a managerial response to get more grip on a messy or complex problem it is not uncommon [32].However this breakdown in manageable sequential chunks (e.g., country or segment) can also result in too much "heads down design" [33], which then becomes a cause for scaling problems.This is what we see happening for clean cooking: initial success cannot be repeated elsewhere without partial or full, time-consuming and costly redesign of the physical product and/or (parts of) the business model.
We therefore seem to be in need of approaches that on the one hand do acknowledge that one cannot design immediate full-scale solutions to complex challenges [32], but also acknowledge that decomposition in small chunks like single segments is not the solution either.The dual challenge of taking into account supply chains for cook stoves as well as fuels does not make the effort easier [34], but all experiences demonstrate that an integrated approach is necessary nevertheless.Possibly this requires cooperation with governments [35] and other stakeholders to allow solutions on systemic level to materialize faster.
In order to work with the contemporary complexity in design challenges the problem analysis can be enriched by including views from more contexts rather than contracting the scope.Using a diversity of contexts to frame the problem [36] looks daunting to many people but more accurately represents the reality of a diverse and complex landscape, in our case the cooking eco-system.Combining diverse contextual views is likely to create some friction in the solution search process but such friction is rather a sign that one is actually dealing with reality instead of a simple but fictitious vacuum [37].
While keeping an open mind as to where relevant information can come from may introduce (the perception of) a risk of a "loss of control" [37], in complex environments this control is illusionary anyway.The proponents of this approach expect that using an intentional multi-contextual attitude is more likely to be an enriching rather than an endangering experience.Acting on that expectation, it seems justified to explore to move from the focus on contextual intelligence [28] to one of collective intelligence that does not disregard the contextual specifics.
To repeat our observation: the number and dispersion of households that are affected shows the necessity of solution directions that cater for a larger diversity of (end-user) needs, which would be conducive for adoption of clean cook stoves on a larger scale.To achieve this goal, looking beyond the initial scope at the very start following this logic reduces the risk of path-dependencies and lock in.As case in point, even in outlining a seemingly advanced method to design cook stoves based on decades of lessons and failures authors propose [38] to incorporate views from multiple angles, but still from within one context.Interpretations from beyond that contextual boundary are left out.This level of diversity does not seem to be sufficient anymore.

A Technological Promise: Gasifier Stoves
Before we continue in Section 2.4 with the analysis on design approaches leading up to a development (/design) approach that addresses the current gap, in this sub-section we first discuss a relevant development in terms of clean cooking technology.
A technology that, for reasons explained below, seems to be promising regarding reduction of the negative effects of household cooking is micro biomass gasification.The core principle is that biomass is burnt under oxygen-lacking conditions to create syngas (a mixture of various non-combustible gases) from volatile matter in biomass and then burnt, generating biochar as non-volatile by-product.This process is carbon efficient and potentially carbon negative under perfect conditions when about 20% of short cycle carbon is trapped in biochar.It is also a cleaner way of burning the biomass, as it creates less black carbon, particulate matter and smell.The magnitude of all these benefits however depends on the exact use, i.e., human handling [5].
Biomass gasification can potentially achieve high thermal efficiencies (bringing down the fuel demand and thus costs), scoring in tier-3 for efficiency.As stated before care should be taken to not attach too much value to technical measurements (alone) [7].In addition it generates a range of related benefits which touch all spheres of sustainability: gasifier stoves can use many fuel types thereby reducing dependence and vulnerability to market fluctuations and they produce fewer incompletely combusted gasses, i.e., fewer harmful emissions.The latter will require more robust and verified testing but is reported to go as far as 90% depending on which stoves are compared [8].As additional potential positive economic and environmental effect the residue biochar can be used as fertilizer.This effect itself is real, but estimating the magnitude would still require more rigorous testing in different environments.Putting such effects in only one box (social, environmental or economic) would not do justice to their integrated nature.Many of the (potential) benefits of gasification for households are self-enforcing and integrated, covering a large part of the sustainability spectrum.
A step towards gasification, which is easier to develop and therefore currently more common for applications like cooking, is semi-gasification.The main difference with full gasification is the fact that the syngas creation and syngas burning zones are not physically separated.These semi-gasifier stoves potentially still provide sizable gains compared with other cooking methods in terms of thermal efficiency, therefore fuel cost efficiency, level of air pollution and greenhouse gasses.While gasifier stoves show good results in a technical sense [39], with too little or too context specific attention to end-users or costs this will still not lead to a breakthrough on substantial scale [5].Some technology oriented initiatives may dig their own grave by pushing for measurement methods that prove their alleged technical superiority, instead of spending time on getting their solution convincingly used in real life [40].
In summary, even though the technologically promising gasifier technology is available it has not yet been adopted on a large scale.We turn next to an approach that might be conducive to achieve scale by taking into account fulfillment of user needs across multiple contexts from the start.

An Approach Promise: Context Variation by Design
We consider the technological promise of gasification for cooking purposes as a given for now, despite the magnitude of this promise also depending on human behavior.Then the next question becomes: which approach should be followed in order to develop a gasifier stove in such a way that it fits the broader complex cooking eco-system, both on micro level (product and product-user interaction) and higher systemic level (fitting needs and abilities of multiple target groups and the value chains these are a part of).In other words, we are looking for an approach that can appropriately address the complexity of this holistic design challenge.
Previously, [41] a design approach has been presented that at least explicitly takes into account multiple contexts from the start, called Context Variation by Design (CVD).First results based on practical experiences were reported as well [42,43].For the purpose of this paper, not primarily being written for a design-audience, we only highlight the following aspects to create a basic understanding of this approach.It will not be discussed in depth after that:

•
the CVD-approach presents four principles (systematic variation, hierarchical decomposition, satisficing and discursiveness) to approach a (complex) design challenge.Together these create the conditions for a design space where one can work with complexity (e.g., multiple target groups simultaneously) instead of being tempted to immediately over-simplify the design challenge.

•
A resulting working principle is to gather perspectives from multiple contexts from the very beginning of the design process, ideally even before the exact design challenge has been decided upon: different contextual views regarding the topic can and will influence the overarching formulation of the design challenge.

•
Such a collective instead of merely contextual intelligence creates a design solution space that reflects reality better than one that is based on premature simplification, e.g., immediate focus on one context.The rich design space facilitates revealing connections and patterns between elements from different contexts.This richness is a welcome basis to derive solution variations that address the diversity of requirements that are typically encountered when solutions are confronted with reality, especially if driven by the necessity to scale.

•
Taking diverse requirements into account, and letting them interact in an early stage before final paths are determined, allows for more adaptable solution platforms.
From all experiences so far it is clear that this approach has logical appeal, while it is based on time-honored design principles [44] and was inspired by extensive practice-based literature from organizational sciences, e.g., [45][46][47].It is however good to realize that this approach is an evolution rather than a revolution, and based on first experiences, it rather seems to tie together and bring to a next level a number of other existing design methods.

Research Set-Up
Previous research, e.g., [42,43], provided first purposefully gathered empirical evidence to support the expectation that a design solution space that explicitly combined insights from multiple contexts outperforms context specific approaches in terms of richness, creativity and relevance.For now we take from these cases that the use of CVD seems promising from a design perspective; further specification of this promise will, as mentioned, need to be verified by more case studies.
In this paper, we want to investigate whether design directions that are developed with CVD score worse, similar or better than comparable results, in terms of actual performance.Against the backdrop of the introduction in Section 2, we want to answer the question: how well does a gasifier stove that is developed using the CVD approach perform, compared to previous versions of this stove, keeping as many as possible other variables the same.
The implicit intention is to investigate how a product version that explicitly caters for and has made use of the requirements from multiple target groups compares with context-specific versions, under the assumption that the former, if it would perform better, will scale faster than the latter.The importance of this potential to satisfy a larger diversity of requirements was highlighted in Sections 1 and 2. Whether this satisfying diverse needs occurs in practice is beyond the scope of this paper.
Considerations that are relevant on what type of performance should be included were extensively discussed in Section 2.1 and can be summarized in that just relying on technical performance creates a strong dependency of testing protocols [7], while adoption also very strongly depends on for example behavioral and eco-systemic factors, e.g., [5,6,17].
In this section, we describe the (history of the) setting where we explicitly investigated the question above (Section 3.1), the considerations to arrive at an assessment method that would allow for a fair comparison (Section 3.2) and the explicit steps that we took based on these considerations to develop the assessment protocol (Section 3.3).The rigor of this process is the main reason the results are fully usable, as long as they are seen in the proper context.This is explained further at the end of Section 3.3

Research Setting: Vietnam, Five Years of Gasifier Development and Assessment
Because we ask a comparative question, we need material to compare.We had the good fortune to have a data set available that represented five years of gasifier stove development in Vietnam, coming from a process that was initiated in 2011.During the course of the development process, it became clear to the principal in charge of that process that many competitors sought to prove the quality of their stoves through conducting tests that would demonstrate their technical performance (power, thermal efficiency, emissions etc.).Apart from the variety of such tests all with their own reasoning and thus conclusions [15,40,48,49] the principal realized that most of them did not sufficiently take into account how the stoves would be used in real life, e.g., how they would in reality resonate with actual end-users.In his experience the technology turned out to play only a small part in the overall decision process on acquiring and using cooking stoves.These points have by now been widely acknowledged in literature, e.g., [5,7,13,14,17].
For these reasons, the principal several years ago started to devise a more holistic list of assessment criteria, based on constant dialogue with a variety of stakeholders.From this dialogue he could derive a range of aspects that they considered important, on top of the bare technical performance.Taken together, these criteria covered a wide range of relevant aspects, far beyond those measured by technology performance tests.These tests were also performed, and the most recent gasifier versions score in tier-3 standards for efficiency and tier-4 for emissions, as tested by independent agencies.These excellent results were in part a consequence of limiting the scope of each version to a specific target group and optimizing one or two aspects at a time.In short, the technical results were very promising, but the principal was keenly aware that eventual adoption would require much more, and had started to assess the gasifiers on many more criteria.
This historical development made available a wealth of historical assessment data about several gasifier versions, expressed by 51 criteria in nine categories (see Appendix A), applied to a total of eleven gasifiers in the course of five years.Together they provide a holistic and practice based overview of quality, i.e., fitness for use [50].The categories cover all sustainability dimensions, from largely economic (low cost) or social (user-friendliness, aesthetics, safety, training) to more integrated ones like environmental and social responsibility, and fuel requirements.
All criteria can be scored from the points of view of different stakeholders, the difference being expressed in different weights per criterion but the assigned score per criterion would be the same.Thus, for each version it could be determined how it would fare from different points of view.The way in which weights and scores are assigned is explained in an assessment protocol (Appendix E).For this paper we will only look at the point of view of the end-user.

Considerations for Assessing and Fairly Comparing Gasifier Technology
In the first half of 2016 the principal coordinated a design process aimed to develop a gasifier version up to testable prototype-level that was explicitly and intentionally aimed at serving both urban and rural segments, i.e., multiple contexts.Versions up till then had been aimed at one segment.The question that we wanted to put to the test was: to which extent does a gasifier version that is derived from a "rich" multi-contextual design solution space (see Section 2.4) perform better, or not, than versions that were developed using a "traditional" innovation approach?This latter approach was the one used so far.The principal's upfront attitude regarding the outcome was neutral, i.e., he was not biased towards a positive or negative result of the inquiry but only interested in what the result would be.
While the historical wealth of assessment data was a relevant basis, to be usable we needed to take a few things under consideration if we were to compare these historical results to the assessment result for the version that would be designed with the suggested approach (CVD):

•
While technical performance of the new gasifier was relevant, it was not the most interesting part: the more holistic set of 51 criteria had proven its value in giving direction for improvements that would contribute to eventual adoption.This was valid, even while until then the priority had been technology optimization, to stay on track vis-à-vis competitors.

•
Not all of the eleven gasifier versions that had been developed were equally relevant or interesting.Some only represented a very small incremental improvement, others were a one-time experimental sidestep.Including all of them in our test would take extra effort without any added value and potentially distort the larger picture; • The assessments had mainly been performed by one person; before proceeding we wanted to make sure that any bias that this person might have would not unduly affect the comparison;

•
In fact, we realized that any scores that are assigned other than by measurements have a level of subjectivity attached to them.We would need to either reduce that or make it irrelevant.One main way to do that is to allow multiple assessors, let them engage in dialogue and arrive at an agreed basis for an inter-subjective assessment.
• Because the values of the assessment scores are relative to the optimal performance per criterion at the time of the assessment (see Appendix E), scoring the same gasifier for example 1, 2, or 3 years later would probably result in a lower score.To be able to fairly compare scores the same frame of reference needs to be used, so we needed a re-adjustment of historical scores.

•
Since we were mostly occupied with investigating the differences in the (pattern) of the assessment results of the previously assessed gasifiers with the new one, we realized that the exact details of the assessment method mattered less than the fact that the method would be applied consistently to all selected versions in our data set.

•
To make sure not to overestimate the assessment score of the new CVD-driven gasifier version, we wanted to compare that score not only with the previous scores, but also take into account the autonomous learning, i.e., account for the reality that any newer version benefits from the accumulated knowledge of the entire process.
Because of these considerations, some steps were required to arrive at a usable data set and assessment protocol.These steps are described in the next section.

Process to Arrive at a Fair Comparison Method
Below we briefly describe the steps resulting from these considerations: (1) Select gasifier versions and invite multiple experts to perform an assessment.
(2) Engage in dialogue resulting in inter-subjective guidelines for adjusted scoring.
(3) Adjust the assessment of the previous gasifier versions.(4) Assess the CVD-version, also using the adjusted guidelines.

Step 1: Select Gasifier Versions and Invite Multiple Experts to Perform an Assessment
We involved three detached Design Experts (DE2, DE3, DE4) to select and assess gasifier versions.The choice of experts was based on three criteria: no active involvement with the project, sufficient knowledge of Vietnam, clean cooking and design and for practical purposes proximity to be able to get them together on short notice.Of the complete set of eleven gasifiers that had been developed in the past years, the experts agreed that four versions represented the most substantial design jumps.These would therefore be used as comparison material.Brief descriptions and main (technical) features of these versions are for reference included in Appendix B.
The fact that both the total number of design experts and number of assessed gasifier versions was four is pure coincidence, there is no relation nor correlation between these two sets.The profiles of the selected experts are provided in Table 1.It should be noted that after step 2 these experts did not have a role anymore.The value of steps 1 and 2 is more the fact that these took place and resulted in adjustment of initial scores and creation of a level playing field (inter-subjective instead of potentially subjective assessment) than involving as many as possible experts.Whether the intended effect took place is discussed in Section 5.1 To make sure all experts used the same interpretation of the criteria, they discussed these before proceeding with the assessment.This assessment consisted of independently allocating scores to the criteria for the selected historical versions, following the protocol as explained in Appendix E.
The assessors did not change the weights per category, consequently any difference between assessors would only be based on scores on the criteria.

Step 2: Engage in Dialogue Resulting in Inter-Subjective Guidelines for Adjusted Scoring
The outcomes of the first round of assessments were compared with each other (see Appendix C) to identify and subsequently discuss patterns.The aim was to explore whether agreed conclusions could be drawn what combined inter-subjective assessment would do justice to the different individual assessments.This interaction aimed at developing shared understanding to adjust the individual scores to one commonly agreed upon inter-subjective score for each version.These guidelines were then also used for the new CVD-gasifier version.The purpose of this step was to provide the basis for a fair and especially internally consistent comparison.

Steps 3 and 4: Adjust Assessment of Previous Gasifier Versions and Assess the CVD Version
With the adjusted assessment guidelines (results in Section 4.1) the principal (DE1) was commissioned to re-assess the four previous gasifier versions (results in Section 4.2) as well as assess the CVD-one (results in Section 4.3).By using this approach, the researchers were certain that (1) all gasifier versions were scored using the same frame of reference; (2) the assessment itself represented the collective expert intelligence instead of one particular expert view; (3) the adjustments reflected the evolution in the development of the technology (i.e., performance on a criterion can score lower if it is assessed against the most recent insights of what is possible) and ( 4) the score of the new version could be compared with a historical pattern.Taken together this approach strongly reduced the risk of bias towards the newest version.
This thoughtful process ensured that the results in this test would be usable for comparison for our purposes, i.e., to achieve the aim of our own research.It did also reveal some considerations about the value of the result, its repeatability and its usability as comparison material with other (gasifier) cook stoves:

•
Our method would ensure a suitable and fair comparison between the scores of the gasifiers in our data set.The results would say very little about the quality of these stoves compared to other ones, but this was not the intention either.

•
Since the assessments rely heavily on expert opinion and assessing the gap between the optimally achievable values and the actual performance (see Appendix E for more details), the assessments have to be performed by the same expert within a short amount of time.

•
The most important realization was that both issues do nothing to reduce the validity of this test; as long as the assessment method is applied consistently to all gasifier versions in this data set, we can draw conclusions for that data set and that is for now all we need.

•
The implication of this realization, which we want to make explicit for clarity's sake, is that the results of this assessment cannot and are not intended to function as benchmark for other (gasifier) stoves.We discuss this point further in Section 5.

•
Because of the expert-based inter-subjective assessment the results cannot be used for absolute statements about effects in terms of emissions, health, exact costs etc. Again, for our purposes this was acceptable.

Results
To keep this main section concise, we show the results of the first step only in Appendix C, on category level.These results are mere input for next steps.We start this section with the main findings of step 2 (Section 4.1).The adjusted assessment scores of the four gasifiers that represent the autonomous learning curve of the principal are shown in Section 4.2, followed by the assessment of the new version in Section 4.3, complemented by qualitative remarks on this new version.In Section 4.4 we compare the assessment of the new version with the autonomous learning.This initial comparison provides the basis for ample discussion in Section 5.

Results from Discussion to Make Assessment Scores Inter-Subjective
The initial scoring of gasifier versions by the experts brought the following patterns to light:

•
DE1 scored consistently higher than the other experts.The scores among the detached experts DE2-4 vary without any dominant overall pattern.

•
The criteria for which the difference is by far the clearest are in the category User-friendliness; to a lesser extent for Aesthetics and Safety.

•
Within the category User-friendliness, the differences in assessment scores were most significant for aspects that involve direct user handling of the gasifier (ease of ignition, charge, recharge, discharge, control flame, stop/restart).The detached Design Experts (DE2-4) assess the user experience on these aspects as less positive than DE1.

•
The scores of experts DE1 and DE2 were more similar than those of DE3 and DE4 for criteria that represent technical aspects.Given the more technical background of DE1 and DE2, the careful conclusion was that their scores might be more accurate for these aspects.

•
The two most recent versions are assessed consistently higher by all experts, which therefore may be taken as a sign that they indeed do perform better.
These patterns were discussed as input for the adjusted scoring by DE1 of all gasifier versions, which reflected these observations.

Comparative Assessment of Older Versions of Gasifier Stoves
Table 2 shows the scores for the four previous gasifier versions after adjustment.The abbreviations between brackets are used in the text afterwards.The differences between the values in Table 2 and the average of the expert scores before adjustment are shown in Appendix C.These differences represent the value of the expert dialogue.
Together the scores in Table 2 also represent the autonomous learning within the development process until and including 2015.The more detailed breakdown of scores per indicator, leading up to the category and overall scores, are because of reasons of space not included in the paper.To however demonstrate the principle of how that process works, the breakdown for the 5th gasifier version (see Section 4.3) is included in Appendix F. As can be seen, the main trend in this development was reduction of total costs for end-users (category 1), mainly by improving the fuel efficiency.This resulted in good test results in terms of thermal efficiency, exceeding performance of commercially available stoves.However, as also can be seen, the increase of the category-score for User-friendliness (2) has been negligible after a considerable jump between the first two versions (FG and Ins).We see similar patterns for most other categories, and the scores on environmental and social responsibility (category 6) even clearly went down in later versions.This is partly due to the fact that when using pellets as fuel, these pellets cost money and are therefore completely burnt, which leaves ash instead of biochar.Biochar however is a positive component of "environmental responsibility" because it is a natural fertilizer.The brief conclusion is that primary focus on fuel efficiency improvement (demonstrated by a clear upward trend for category 1) seems to have resulted in (unconscious) neglect of the other criteria.
This sets the scene to find answers to the main question of the paper: would an intentional multi-context approach, i.e., CVD, change the pattern of the scores and if so how strongly and/or divided over the different categories?

Assessment of the CVD Gasifier Stove
To find answers to this main question for our paper we need to compare the assessment of the CVD-gasifier version with the previous ones.For completeness sake, a brief description of the CVD-version and technical specifications is included in Appendix D.
The results of the comparison are shown in Table 3, in the second column, followed by the percentage-difference of each category (and overall) score with the three previous ones as shown in Table 2.The first version (FG) is omitted in Table 3 because the time difference (five years) would be unreasonably large.
All values are expressed as percentage of the earliest score.For example, value 15% for category 1 in the FSA (Forced secondary air) column conveys that compared with the FSA-score for that category the CVD-version scored 15% higher, the 69% for category 1 in the Ins (Insulated) column shows that the CVD version scored 69% higher than the Insulated version on category 1 and so on.Here we share a few notable observations when we compare these scores.In the next sections we discuss the implications of these findings.

•
The gasifier that has been designed with the CVD-approach scores the highest of all versions in the data set, with a difference of 17%, 21% and 42% respectively for the overall score compared with the three preceding versions.

•
The biggest jumps are made in categories 6, 3 and 1.The category User-friendliness (2) keeps improving.

•
We see a strong increase in the score of the category on Environmental and social responsibility (6).One reason is the use of the gasification by-product biochar as fertilizer, which thanks to considering rural and urban segments is now an option that constitutes more value.Another reason is that the CVD-version can be produced locally.
To be able to interpret these findings (Sections 4.4 and 5) and draw conclusions (Section 6) we add a number of remarks:

•
One of the elements that was covered by the CVD-version but not explicitly by the criteria is the fact that it can cater for multiple target groups at the same time (urban small commercial and rural).This is likely to lead to benefits for all but these benefits are not directly expressed in the criteria.

•
The suitability for multiple target groups also opens the door to new business model options.Because these have not been developed and tested in practice yet these can also not be expressed in the performance scores.

•
The scores used were the adjusted ones.The (average of the) initial expert scores (Appendix C) would have given a similar picture but the data points would have gone more up and down.The replacement of these scores by the adjusted ones represents the collective expert intelligence that was mobilized.

Comparing the Performance of the CVD Version with Older Ones
How should we interpret these findings?An extensive interpretation follows in Section 5.The thoughts below allow for a more effective discussion in that section.
Like stated in Section 3, we considered the comparison with a historical learning process relevant in addition to comparison with an individual preceding gasifier version.This can be considered as a stricter requirement for the new version, since to be assessed as "better" it does have to beat the performance of individual versions as well as show a jump compared with the overall learning.So how did it do compared to the historical (autonomous) learning?
To determine this we first look at Figure 1: it shows the overall scores of the four historical versions and the CVD-version.We can clearly see that the jump from the FSA version to the CVD version is considerable, although roughly similar in size as the one from Insulated to Pelletized.

Comparing the Performance of the CVD Version with Older Ones
How should we interpret these findings?An extensive interpretation follows in Section 5.The thoughts below allow for a more effective discussion in that section.
Like stated in Section 3, we considered the comparison with a historical learning process relevant in addition to comparison with an individual preceding gasifier version.This can be considered as a stricter requirement for the new version, since to be assessed as "better" it does have to beat the performance of individual versions as well as show a jump compared with the overall learning.So how did it do compared to the historical (autonomous) learning?
To determine this we first look at Figure 1: it shows the overall scores of the four historical versions and the CVD-version.We can clearly see that the jump from the FSA version to the CVD version is considerable, although roughly similar in size as the one from Insulated to Pelletized.Secondly, if we draw an imaginary line between the total scores for the first and last version in the historical data set (548 for the first version, 664 for the last), we get the best fit for the historical learning in Figure 2. Because the data set is non-parametrical and there are far too few data points (4) to construct a statistically meaningful trendline, drawing a line between the first and last data point is the only possible way to depict an approximation of this learning process.If this line is visually extrapolated, we see what the approximate Expected overall score for the "next" gasifier would be.If we compare that to the Actual total score of the CVD version (780, see Table 3), we can conclude that the actual CVD version scores about 10% higher than could be predicted based on best available approximation of the autonomous learning alone.This 10% is therefore the best estimate for the jump on top of the expected increase in the score.Secondly, if we draw an imaginary line between the total scores for the first and last version in the historical data set (548 for the first version, 664 for the last), we get the best fit for the historical learning in Figure 2. Because the data set is non-parametrical and there are far too few data points (4) to construct a statistically meaningful trendline, drawing a line between the first and last data point is the only possible way to depict an approximation of this learning process.If this line is visually extrapolated, we see what the approximate Expected overall score for the "next" gasifier would be.If we compare that to the Actual total score of the CVD version (780, see Table 3), we can conclude that the actual CVD version scores about 10% higher than could be predicted based on best available approximation of the autonomous learning alone.This 10% is therefore the best estimate for the jump on top of the expected increase in the score.These are the findings.How can we interpret these in terms of level of value added of the CVD-approach?We look at this in more detail in the next section.

Discussion
In this section, we discuss and interpret the findings more in depth: the effects of using an agreed set of assessment guidelines (Section 5.1), effect of the holistic CVD-approach on the width of the score changes, Section 5.2), incorporating less quantifiable effects of a more holistic approach (Section 5.3) and limitations of the research (Section 5.4).We refer back to literature that we discussed in Sections 1 and 2.

Effects of Combining Intelligence vs. Averaging Assessments
The purpose of steps 1 and 2 as described in Section 3.3 was to make sure that the comparison would be based on an inter-subjective assessment.Therefore, did this process have an effect on the historically assigned assessment scores?If we compare the initial scores of the main expert (DE1) with the average of the scores of all four experts, we can derive that this difference was on average about 10% (see Appendix C) before adjustment.If we calculate the differences between the adjusted scores by DE1 (step 3 in Section 3.3) and the pre-adjusted averages, we would see the differences shrink to less than 1%.We can conclude that the dialogue, similar to a Delphi-method [51,52], together with using the possible optimal performance at the time of the development of the CVD-version as benchmark resulted in an assessment basis that reflected the collective intelligence of the group of experts as well as the autonomous evolution of gasifiers.

Interpreting the Comparative Performance of (Gasifier) Stoves
If we consider the width of score changes of the CVD-version compared to the preceding one (FSA), we see an improvement in all categories (except 7 and 8).We can also formulate this to state that the CVD-version shows an improvement across a range of sustainability related areas, without however making statements about the magnitude of any of the effects.
If we briefly look at the other transitions, we can see that when moving from the Fixed Grate (FG) to the Insulated (Ins) version, the main improvement in scores occurred in category 2 (User-friendliness) but this negatively affected performance in five other categories, which thus had to be assigned a lower score.The previous major jump came from the Ins(ulated) to the Pell(etized) version, and it was slightly bigger in relative terms (18% vs. 17%) but smaller in absolute terms (98 vs. 116 points).Again, this former score increase came solely from the change in category 1, with These are the findings.How can we interpret these in terms of level of value added of the CVD-approach?We look at this in more detail in the next section.

Discussion
In this section, we discuss and interpret the findings more in depth: the effects of using an agreed set of assessment guidelines (Section 5.1), effect of the holistic CVD-approach on the width of the score changes, Section 5.2), incorporating less quantifiable effects of a more holistic approach (Section 5.3) and limitations of the research (Section 5.4).We refer back to literature that we discussed in Sections 1 and 2.

Effects of Combining Intelligence vs. Averaging Assessments
The purpose of steps 1 and 2 as described in Section 3.3 was to make sure that the comparison would be based on an inter-subjective assessment.Therefore, did this process have an effect on the historically assigned assessment scores?If we compare the initial scores of the main expert (DE1) with the average of the scores of all four experts, we can derive that this difference was on average about 10% (see Appendix C) before adjustment.If we calculate the differences between the adjusted scores by DE1 (step 3 in Section 3.3) and the pre-adjusted averages, we would see the differences shrink to less than 1%.We can conclude that the dialogue, similar to a Delphi-method [51,52], together with using the possible optimal performance at the time of the development of the CVD-version as benchmark resulted in an assessment basis that reflected the collective intelligence of the group of experts as well as the autonomous evolution of gasifiers.

Interpreting the Comparative Performance of (Gasifier) Stoves
If we consider the width of score changes of the CVD-version compared to the preceding one (FSA), we see an improvement in all categories (except 7 and 8).We can also formulate this to state that the CVD-version shows an improvement across a range of sustainability related areas, without however making statements about the magnitude of any of the effects.
If we briefly look at the other transitions, we can see that when moving from the Fixed Grate (FG) to the Insulated (Ins) version, the main improvement in scores occurred in category 2 (User-friendliness) but this negatively affected performance in five other categories, which thus had to be assigned a lower score.The previous major jump came from the Ins(ulated) to the Pell(etized) version, and it was slightly bigger in relative terms (18% vs. 17%) but smaller in absolute terms (98 vs. 116 points).
Again, this former score increase came solely from the change in category 1, with virtual standstill in the other categories and the score in category 6 even decreasing sharply.Furthermore, the main and perhaps only reason for the jump in category 1 was the increase in fuel-tolerance (by means of the move to pellets as opposed to rough fuel sources).While the jump at first sight seems impressive, the implications are bigger in eco-system terms (processing of rough fuel sources into pellets) than in technological terms.The required changes in the eco-system to make this version commercially feasible were however not addressed.This type of limited improvement seems to be an example of the desirability to prevent "heads down design" [33].
Therefore, by looking at the full pattern behind the transitions, we see that with the move from the FSA to the CVD-version, the performance in all categories improved, some by very decent margins, without a spectacular outlier like with the move from Insulated to Pelletized.
This analysis supports the notion that using an approach that more explicitly accepts complexity (CVD) can benefit individual categories as well as the whole "system".While from a theoretical standpoint this is to be expected, e.g., [13,14,33] this test confirms that notion, at least for this case.
Finally, we like to add that the weights per category and scores assigned for each criterion do not matter much for the purpose of this paper, as long as the same weights and assessment protocol is used for all gasifiers under consideration in a particular experiment.The aim is to analyze the differences between the assessment scores, more than the details of the absolute values.

Identifying Further Benefits of a Holistic Approach
Like briefly stated in Section 4.3, there are some effects of a holistic approach, of which CVD is an example, that are not directly captured yet in the performance criteria, but that may prove to be essential for future implementation.In particular, the CVD version recognizes the existence of complexity (e.g., multiple target groups) and accepts it as a reality to address, which resonates with the statements from previous research on this topic [9,30].Because of this inclusive approach that works with diversity instead of considering it too complex to deal with [24], chances are increasing of satisfying needs of a more diverse set of beneficiaries, eventually allowing adoption on a scale that is more financially viable for the company.Whether this will play out remains to be seen, because the principal's main strength is development and pilot-testing.For commercialization, new partnerships will have to be developed.Adding and properly assessing more social and business oriented criteria that would add value to this assessment method seems desirable and ideally triggers collaboration between different disciplines.

Limitations
The research approach was necessary given the circumstances and initial state of the data, but contained various pitfalls to be avoided.These were mainly related to potential bias of assessor DE1, who works at the company that is developing the gasifiers.Because of the effort to make the assessment inter-subjective we feel confident that the assessments as reported in Sections 4.2 and 4.3 do not display a bias.Furthermore, by comparing results relative to the historical learning, we have a fair and complete result that does not over-estimate the performance of the CVD-version.
At first sight the reality that different experts in future experiments might assign different weights to categories, use different criteria and even assign different scores to these criteria seems to be a limitation to this research.However as long as all these decisions still result in an internally consistent assessment process for all gasifiers included in a data set, the same type of analysis can be made, enabling researchers to draw their conclusions based on their own results.Because of the set-up, we cannot draw any meaningful conclusions about how the gasifiers in this dataset perform against other gasifiers, but that was not the intention.The intention was to be able to compare gasifiers within this one data set and investigate whether using a different design method, with all other variables as equal as possible would result in a different outcome.For that purpose, this set-up proved suitable.
One possible change for the future in the assessment method and protocol might be to let stakeholders like end-users perform the assessment or at least involve them directly in assigning scores on criteria.In fact, a small test was performed with eight end-users.This test showed that for most criteria they do not possess the insights so the assessment data would get "polluted".Only for directly relevant categories like User-friendliness they could contribute to the assessment.For this paper that was not possible because they could not be mobilized for such an assessment in a way that would be consistent for all considered gasifier versions.

Conclusions and Next Steps
The main conclusions to be drawn from this paper then revolve around the question what effect-in terms of both multi-dimensional performance and multi-context potential-can we see of using the intentional multi-context approach Context Variation by Design (CVD) as compared to a traditional optimization oriented development approach.Or as stated in Section 3.1 "to which extent does a (gasifier) version that is derived from a "rich" multi-contextual design space perform better, or not, than versions that use a "traditional" innovation approach".In addition, what can we conclude, in relation to the identified societal problem of "dirty cooking" and about the approach in general?
The gasifier that was developed by means of the multi-contextual design approach CVD received an overall assessment that was substantially better (17%) than the previous highest scoring one until then.This was the same order of magnitude as the highest historical increase in overall score, from Insulated to Pelletized.The latter score increase however can be attributed to one single design choice, which is reflected in the spectacular increase in the score in just one category.The improvement from the most recent regular version (FSA) to the CVD version occurs over the full width of assessment categories: all category scores increased, some considerable, others less so.
Moreover, the overall score is to our best possible estimate given the limited number of data points, roughly 10% higher than if it would have scored in compliance with the historical learning development; this 10% can be considered the extra jump in the performance.
This improvement (expected score increase + extra jump in the score) over the full width of categories is a departure from previous assessment score changes.From the data that we gathered and analyzed we can infer that at least in this case the CVD-approach represents a departure from "heads down design" [33].Further research would need to verify whether this would also occur in other settings.For this case the conclusion seems to be strengthened by the observation that some benefits of the new version, like additional business model options, are not explicitly captured yet in the criteria and therefore require additional qualitative assessment.Such reflection in addition to numbers or looking at just technological features is in line with comments by [5,40].
Whereas in previous papers [42,43] examples were provided how an approach that takes multiple contexts into account creates a rich design space, in this paper we show that-in the reported case-this approach results in a solution that actually performs better than other versions in the data set and also better than could be expected based on autonomous learning alone.
Will this better performance result in high(er) adoption rates?It is beyond the scope of this paper to answer this question, but it seems logical that results that explicitly account for catering to needs in multiple contexts are indeed wider adopted.If this turns out to be true this would mean that a full range of sustainability related effects (improved health, better affordability, less deforestation, fewer emissions) are achieved, for a wider diversity of beneficiaries than is likely with current solutions.Whether such effects will materialize and how big they will be in reality depends on the commercialization ability of the partnership to create a business architecture that allows adoption on a large scale.The CVD approach provides a way to develop a product architecture that places the partnership in a good position to achieve such scale from the point of view of meeting diverse contextual requirements.
Based on the conclusions and preceding discussion we suggest the following next steps: • Applying the CVD-approach in more projects, clean cooking related and otherwise, would create more understanding on the influences of the approach on richness of design analysis and performance of actual solutions including conditions when this is (more or most) likely to occur.

•
Authors or other researchers can explore whether the conclusions would change if other weights, criteria or scores would be used.However, such changes would need to be made based on expert arguments that would justify them and need to be consistent for the entire data set.

•
If the statements in this paper about the importance of non-technical, e.g., social and business related, criteria for the overall assessment of gasifier stoves are supported, researchers from other disciplines may consider to collaborate in improving the overall set of criteria.With changed criteria historical comparisons will not be possible, but the gain of a more holistic assessment to guide present and future developments may be interesting to consider.

•
The well-considered inclusion of end-users for assessment of categories that they can be expected to have the knowledge of will improve the verification of performance assessment, quantitative and qualitative.To be used in a comparative assessment, the assessment-subjects (e.g., gasifiers) do still have to exist and all user-based assessments would have to be done in the same way by the same group of end-users to avoid distorting variations.

•
For the principal of this research the results seem promising to continue with.This includes finding partners to move beyond pilot-testing phase and good communication to end-users so they can let the potential benefits of gasifier stoves manifest itself in reality.

Appendix C. Results of Initial Expert-Scoring, Scores on Category Level
This appendix contains the initial scores given by all experts, which were input for the discussion to arrive at an inter-subjective set of assessment guidelines for rescoring the first four gasifier versions and then the CVD-one, to create an agreed and level playing field.The values in last two columns are rounded to the nearest whole number in case of fractions.Expressed in a formula, the above looks like this: where, Wi = weight for criterion i.
The same expert needs to assign the weights to prevent inconsistencies.In the case described in this paper this expert was the principal.For all assessments reported in this paper his initial weighting-decisions were used, for stakeholder "end users".These weights are shown below.As can be seen, the total of all weights, i.e., the values in the bottom row added up, is exactly 100.

Appendix E.3. Assessing a Gasifier
With the weights having been set, an expert needs to assign a score between 0 and 10 for each criterion.A 10 signifies that the realistic optimum for that criterion at the time of assessment has been reached, a 0 indicates a maximum gap between the current performance and what is the realistic optimum for this criterion.All criteria thus receive a score by an assessor.Each criterion score is multiplied by its respective weight.A criterion that has a high score but 0-weight then still yields 0 points, as does a 0-score for a criterion with a high weight-value.All calculated values (score * weight) taken together result in the total (normalized) score for that gasifier, based on the assessment of that expert.This process can be repeated for any gasifier.As long as the scores are assigned by the same expert, they are suitable for comparative analysis.
The implication of this method is that the maximum total number of points in an assessment can be 1000 (i.e., 100 total weight points * 10 for all criteria), or ScoreMax = 1000.The actual score for each category and then for a gasifier version can be calculated using the following formulae: where, Y = Score for category n, with n between 1 and 9, Wi = Weight for criterion i in this category, Si = assigned score for criterion i in this category, i = criterion number in category n, maximally reaching value 15 in this data set, if there are less the product of weight * score is 0 for that value, and Total Gasifier Assessment Score = ∑ i=9 i=1 Yi where, i = category ranging from 1 to 9, Y = Total category score.
The distance between the actual overall score for a stove and 1000 indicates how far removed the gasifier is from the maximum possible performance at that time.It also means that the same gasifier can receive a lower score when it is assessed again at a later moment, but that new score is then a better indication how it performs compared with a new gasifier which is assessed at the same time, which is the situation that we present in this paper.As explained, while the gap between the actual total score and 1000 indicates the possible room for improvement for any individual version, the differences between the scores of individual versions are more suitable to use as basis for comparative reflection.
The step to go from the initial historical assessment scores for the first four gasifiers in the dataset to the adjusted scores reflects the evolution in time (performance is likely to relatively become less valuable) and the collective expert opinion as opposed to an individual expert opinion.The collective expert opinion is represented by the remarks in Section 4.1.

Figure 1 .
Figure 1.Historical development of overall assessment scores.

Figure 1 .
Figure 1.Historical development of overall assessment scores.

Figure 2 .
Figure 2. Best approximation for autonomous learning.

Table 1 .
Main characteristics of the involved design experts.

Table 3 .
Comparative performance assessment of CVD gasifier with previous versions.

Table A7 .
Weight per criterion, for End-user stakeholder.