Product Environmental Footprint (PEF) Pilot Phase—Comparability over Flexibility?

The main goal of the European product environmental footprint (PEF) method is to increase comparability of environmental impacts of products within certain product categories by decreasing flexibility and therefore achieving reproducibility of results. Comparability is supposed to be further increased by developing product category specific rules (PEFCRs). The aim of this paper is to evaluate if the main goal of the PEF method has been achieved. This is done by a comprehensive analysis of the PEF guide, the current PEFCR guide, the developed PEFCRs, as well as the insights gained from participating in the pilot phase. The analysis reveals that the PEF method as well as its implementation in PEFCRs are not able to guarantee fair comparability due to shortcomings related to the (1) definition of product performance; (2) definition of the product category; (3) definition and determination of the representative product; (4) modeling of electricity; (5) requirements for the use of secondary data; (6) circular footprint formula; (7) life cycle impact assessment methods; and (8) approach to prioritize impact categories. For some of these shortcomings, recommendations for improvement are provided. This paper demonstrates that the PEF method has to be further improved to guarantee fair comparability.


Introduction
In 2013, the European Commission (EC) launched the communication "Building the Single Market for Green Products (COM (2013) 0196 final)" [1] and the recommendation "On the use of common methods to measure and communicate the life cycle environmental performance of products and organizations (2013/179/EU)" [2].The main goal of the so-called product environmental footprint (PEF) method [2] is to increase comparability between products of the same product category (and therefore also allow for comparisons and comparative assertions) applying the "comparability over flexibility" approach, meaning that by reducing the flexibility of methodological choices the comparability of products increases [3,4].The increased comparability was expected to be achieved by predefining specifications for certain methodological aspects based on value choices (e.g., modeling of the end-of-life phase), and thus reducing the flexibility for which ISO 14040/44 [5] (the international agreed upon standard for life cycle assessment-LCA) is known for.Furthermore, by developing product category specific rules (PEFCRs) for certain product categories, additional specifications are determined [2].
Comparisons are carried out, when competing products that perform the same function are compared regarding their environmental performance [6,7].Comparative assertions are carried out, when a statement-an environmental claim-is made regarding the superiority or equivalence of one product versus a competing product that performs the same function [7].
To better understand the differences and similarities between the PEF method [2] developed by the EC and the PCR concept based on ISO 14040/44 [5,7] an overview is provided in Figure 1.ISO 14040/44 is the basis for ISO 14025 [8] as well as for the PEF guide [2].However, it should be noted that the PEF guide is not conform with ISO 14040/44, and even partly contradicting [9], e.g., PEF allows for comparisons and comparative assertions based on normalized and weighted results, which is explicitly excluded in ISO 14040/44.Based on ISO 14025, product category rules (PCRs) are developed, which provide detailed rules on how to model the life cycle of a specific product of a product category-i.e., groups of products that are able to fulfill equivalent functions.Based on the PCR, environmental product declarations (EPD) can be carried out, which can be seen as specific case studies of products within the considered product category, following the rules provided within the associated PCR.PEFCRs can be seen as the PEF version of PCRs, i.e., they are rules complementing and specifying the PEF guide [2] and the PEFCR guide [6] for the considered product categories.The PEFCR guide is the guidance document of the PEF pilot phase and was updated several times during the pilot phase: overall, seven versions were published, starting with version 3.4 in February 2014 and concluding with version 6.3 in December 2017.Based on existing PEFCRs, PEF studies can be carried out, which can be used for comparison and comparative assertions [2,6,10].PEF studies for internal use analogues to LCA studies can be carried out based on the PEF guide only.
Sustainability 2018, 10, x FOR PEER REVIEW 2 of 18 when a statement-an environmental claim-is made regarding the superiority or equivalence of one product versus a competing product that performs the same function [7].
To better understand the differences and similarities between the PEF method [2] developed by the EC and the PCR concept based on ISO 14040/44 [5,7] an overview is provided in Figure 1.ISO 14040/44 is the basis for ISO 14025 [8] as well as for the PEF guide [2].However, it should be noted that the PEF guide is not conform with ISO 14040/44, and even partly contradicting [9], e.g., PEF allows for comparisons and comparative assertions based on normalized and weighted results, which is explicitly excluded in ISO 14040/44.Based on ISO 14025, product category rules (PCRs) are developed, which provide detailed rules on how to model the life cycle of a specific product of a product category-i.e., groups of products that are able to fulfill equivalent functions.Based on the PCR, environmental product declarations (EPD) can be carried out, which can be seen as specific case studies of products within the considered product category, following the rules provided within the associated PCR.PEFCRs can be seen as the PEF version of PCRs, i.e., they are rules complementing and specifying the PEF guide [2] and the PEFCR guide [6] for the considered product categories.The PEFCR guide is the guidance document of the PEF pilot phase and was updated several times during the pilot phase: overall, seven versions were published, starting with version 3.4 in February 2014 and concluding with version 6.3 in December 2017.Based on existing PEFCRs, PEF studies can be carried out, which can be used for comparison and comparative assertions [2,6,10].PEF studies for internal use analogues to LCA studies can be carried out based on the PEF guide only.[2] and the PCR concept based on ISO 14040/44 [5,7] (own figure).
In November 2013, the PEF pilot phase was launched to test the PEF method and to develop PEFCRs for selected product categories.The first wave started with 14 non-food products (batteries and accumulators, decorative paint, footwear, hot and cold water pipes, household detergents, intermediate paper products, IT equipment, leather, metal sheets, photovoltaic electricity generation, thermal insulation, T-shirts, uninterruptible power supplies; discontinued: stationary) and was complemented in 2014 with the second wave of 11 food products (beer, dairy, feed for foodproducing animals, pasta, packed water, pet food, olive oil, wine; discontinued: coffee, marine fish, meat).The pilot projects were chosen by the EC based on criteria such as diversity of product categories covered and availability of good quality secondary life cycle data [11].The pilot phase ended in April 2018 with only 10 pilot projects (dairy, decorative paints, feed for food-producing animals, IT equipment, leather, packed water, pasta, pet food, rechargeable batteries, and wine) able to provide final PEFCRs [12] due to delays within the pilot phase.For some of the other pilots-e.g., household detergents or intermediate paper products-PEFCRs are expected to be published throughout the year 2018.Now, the transition phase has started, which will last until the end of 2021.It aims at monitoring the implementation of the existing PEFCRs, the development of new PEFCRs as well as fine tuning In November 2013, the PEF pilot phase was launched to test the PEF method and to develop PEFCRs for selected product categories.The first wave started with 14 non-food products (batteries and accumulators, decorative paint, footwear, hot and cold water pipes, household detergents, intermediate paper products, IT equipment, leather, metal sheets, photovoltaic electricity generation, thermal insulation, T-shirts, uninterruptible power supplies; discontinued: stationary) and was complemented in 2014 with the second wave of 11 food products (beer, dairy, feed for food-producing animals, pasta, packed water, pet food, olive oil, wine; discontinued: coffee, marine fish, meat).The pilot projects were chosen by the EC based on criteria such as diversity of product categories covered and availability of good quality secondary life cycle data [11].The pilot phase ended in April 2018 with only 10 pilot projects (dairy, decorative paints, feed for food-producing animals, IT equipment, leather, packed water, pasta, pet food, rechargeable batteries, and wine) able to provide final PEFCRs [12] due to delays within the pilot phase.For some of the other pilots-e.g., household detergents or intermediate paper products-PEFCRs are expected to be published throughout the year 2018.Now, the transition phase has started, which will last until the end of 2021.It aims at monitoring the implementation of the existing PEFCRs, the development of new PEFCRs as well as fine tuning of methodological aspects of the PEF method and the PEFCR guide.Within the transition phase, it will be discussed what potential future applications PEF and the PEFCRs could have.
Numerous experts and stakeholders were involved in the pilot phase including companies and industry association, nongovernmental organizations, academia, the EC, the member countries (represented within the steering committee (SC)), and the technical advisory board (TAB), which consisted of experts supporting the SC and providing technical advice related to the ongoing pilot projects as well as to overall issues related to PEF and LCA [6,10].
Due to the combined effort of these stakeholders several positive outcomes of the PEF pilot phase can be observed: some sectors particularly from the food industry have started to apply LCA-or rather PEF-and challenges related to methodological and practical requirements of LCA were discussed amongst a variety of stakeholders leading for instance to an improved and updated life cycle impact assessment (LCIA) method for toxicity impacts [13].Furthermore, several product category rules (or rather PEFCRs) were developed for new product categories (e.g., beer).Some of the methodological and practical challenges of the PEF method mentioned in previous publications (e.g., [3,4,9,10,14]) were tackled, e.g., by introducing more mature LCIA methods for the categories water use (for which now the AWARE method [15] is recommended), land use (for which now the LANCA method [16] is recommended), particulate matter (for which now the method by Fantke [17] is recommended) and resource use (for which now the ADP method based on ultimate reserves [18,19] is recommended) [2,6].Furthermore, workshops were organized and cross cutting working groups were established to discuss issues like the modeling of the end-of-life (EoL) phase.Thus, on the one hand, it should be acknowledged that PEF picked up these topics providing a platform for an exchange of opinions and discussions.However, on the other hand, the amount of resources spent in the pilot phase were extremely high and several methodological and practical challenges of the PEF method and the developed PEFCRs still exist.These challenges have been already addressed in several publications by the authors of this paper [4,9,10,20], but also by industry [3,21] as well as policy makers [22].
The main goal of this paper is to analyze if the adapted PEF method [2] (as shown in the current PEFCR guide [6]) as well as the established PEFCRs allow for comparability, especially comparisons and comparative assertions.As according to the PEF method, comparability can be achieved by reducing flexibility [2], it is examined if this claim is supported by the developed PEFCRs [12] and the current PEFCR guide [6].Our main concern is that the defined rules of the PEF method, the PEFCR guide and the existing PEFCRs-thus, the reduced flexibility-do not guarantee reliable comparability, which is the main goal of the PEF method.The PEF pilot phase is analyzed and remaining challenges are discussed.Furthermore, recommendations are provided to support a successful implementation of PEF and the developed PEFCRs.In the following, the procedure how the analysis is carried out is introduced (Section 2) and existing challenges of the PEF method and PEFCRs impeding fair comparisons and comparative assertions are discussed as well as recommendations for improvements are provided (Section 3).Finally, conclusions are drawn (Section 4).

Materials and Methods
A comprehensive evaluation of the entire PEF pilot phase of almost 4.5 years (November 2013 to April 2018) was carried out.The following documents were taken into account:

•
Previous work and publications of the authors (e.g., [9,10,23]) Other existing publications related to PEF (e.g., [14,[24][25][26]) All available documents developed within the pilot phase: All different versions of the 21 developed PEFCRs: scope definition, 1 draft PEFCR, and 2 draft PEFCRs (with up to 100 pages); Final PEFCRs for 10 pilots (with up to 150 pages); Screening study reports (with up to 200 pages) for all 21 pilots: screening studies are the PEF studies for the specific representative products of a product category.Their goal is to determine the relevant environmental impacts (as well as life cycle stages, processes, and elementary flows) of the considered product categories and therefore define the specifications of each PEFCR, e.g., when to use primary and when secondary data.
The results of the screening study also serve as the basis for the benchmark, as the representative product is automatically set as Class C (e.g., when assuming an A-F scale, with 'A' indicating the best and 'F' the worst performance).Various issue papers (e.g., [27,28]) addressing topics like modeling of electricity.Issue papers are publications by the EC regarding summaries of the state of the art as well as proposals for solving methodological challenges.They served as the basis for discussions in the TAB.
Seven versions of the PEFCR guide provided by the EC over the course of the pilot phase (the PEFCR guide was adapted according to the outcomes of the pilot phase, e.g., predefining more mature LCIA methods), focusing on the latest version (6.3) [6].
Besides this, the authors have been actively taking part in the PEF pilot phase as stakeholder in some pilot projects and as TAB member participating in almost all TAB as well as joined SC/TAB meetings.Thus, the authors were able to gain direct insight into the organization and outcomes of the pilot phase.
A comprehensive analysis was carried out to identify the still-existing challenges of the PEF method.These challenges are analyzed with regard to future application options of PEF, which are:

•
Internal implementation: applying the PEF method for internal product/process improvement • Business-to-business (B2B) communication: comparisons of products based on a PEF report • Business-to-consumer (B2C) communication: comparisons and comparative assertions of products based on labels [6] Where possible also examples are provided to illustrate the challenges described.The results of this comprehensive analysis of the pilot phase including recommendations for improvement are presented in the following section complementing the findings of previous publications.

Results and Discussion
Several methodological and practical challenges were identified with regard to PEF being able to guarantee fair comparability by reducing flexibility.They include (1) definition of scope (definition of the product performance with the functional unit, differentiation of products by defining the product category and the representative product); (2) the modeling of the life cycle of a product (modeling of electricity, use of secondary data, modeling of EoL allocation); and (3) impact assessment and interpretation (applicability and reliability of impact assessment methods and prioritization of impact categories by normalization and weighting).In the following, these challenges are explained in detail and recommendations are provided for how to solve them.However, it should be pointed out that recommendations cannot be provided for all shortcomings-simply because for some of them adequate solutions are currently not available.

Definition of Product Performance
The PEF method introduced a new approach to determine the functional unit: it has to be defined answering the four questions: 'what', 'how well', 'how much', and 'how long'.By providing such detailed requirements, the flexibility provided by ISO 14040/44 is reduced.However, even though the idea of giving more guidance for defining the functional unit is good, it was proven to be challenging to define the functional unit considering these requirements.Several pilots did not address them properly in their PEFCRs nor did they provide a functional unit able to allow for fair comparability.Especially the definition of 'how well', which shall be used to describe relevant quality/performance aspects of the product (e.g., washing performance of detergents) is often not carried out properly.This is relevant in order to allow for fair comparability, because only products based on the same performance/with the same quality shall be allowed to be compared.Currently, the performance/quality of the analyzed product is not adequately taken into account in any of the finalized PEFCRs, because 'how well' a product performs is not properly addressed: for one thing, the functional unit is not defined adequately, but also standardized tests to check if the defined performance/quality can be fulfilled by the specific product are not available or declared.
For example: The pilot 'feed for food producing animals' defines the functional unit as 1 kg feed without considering any quality aspects such as metabolizable energy, which is a relevant decision factor for farmers to choose a certain type of feed [29,30].That means, that possibly two feeds (A and B) can be compared, which have different performance/quality aspects, e.g., feed A has more kilocalories of metabolizable energy (kcal ME) than feed B. However, it might be possible that the higher amount of kilocalories is related to higher environmental impacts.The consumer-e.g., farmer or a company-who buys feed B based on its better environmental profile ends up using more feed to reach the same caloric requirement as with feed A. Therefore, the use of feed B leads to more environmental impacts.This example shows that without considering the performance of products, their comparison may lead to incentivizing products with a worse environmental performance compared to its alternatives.
We recommend the following: for internal application the product performance does not have to be tracked as it can be assumed that the company makes sure that the performance/quality of the product is consistent.However, for B2B as well as B2C communication, the performance of a product of a specific product category are relevant factors.To allow for fair comparability, the functional unit has to be defined in a way, that performance/quality aspects (e.g., meet daily caloric and nutritional requirements) are included.Furthermore, parameters to measure these performance/quality aspects (e.g., kilocalories of metabolizable energy, type of protein, water content, etc.) have to be taken into account and standardized methods assessing if every product of the product category fulfills the requirements have to be set up.

Definition of the Product Category
According to PEF, the product category has to be defined based on the Classification of Products by Activity code [31] (CPA code) [6].Thus, by determining the product category, products that are deemed to be comparable alternatives are defined.The definition of a product category based on the CPA code reduces the flexibility provided by ISO 14025, but does not contribute to increase comparability.It was shown in the pilot phase that the use of CPA codes to adequately set up product categories is challenging [32].Thus, the EC analyzed possible principles (e.g., consumer's perspective, similarity of products, similarities of rules, etc.), which could be applied to define an appropriate product category [32].However, even though this analysis has contributed to the overall understanding of the challenge, it did not solve it.Within the pilot projects the product categories are inconsistently defined: they range from very narrow (e.g., heavy duty liquid laundry detergents) to very broad (e.g., beer).Currently, the definition of the product category of several of the final PEFCRs does not allow for sufficient differentiation of products, which is required for meaningful comparisons.The definition of the functional unit is closely linked to the challenge of defining the product category, because in both cases the products which are able to be compared are defined.The current approach does not consider comparability from a consumer point of view, because a consumer for instance may prefer to know whether it is better from an environmental perspective to buy a liquid or a powder detergent, but who may not be interested in comparing e.g., a wheat beer with a mixed beer.
We recommend the following: For internal application, thus optimization of products and processes, there is no need to define a product category.For B2B and B2C communication a suitable approach to consistently define product categories is required to allow for a clear differentiation of products as well as fair comparability.We think that using the CPA code alone is not adequate to set up product categories.However, the CPA code could be one of the aspects considered for determining an adequate method to define the product category.For this challenge currently no ready-to-go solutions are available and further research is needed to tackle it.Therefore, the ECs effort to find more adequate solutions in the transitioning phase [33] is appreciated as well as necessary to allow for fair comparability.

Definition of the Representative Product
The aim of establishing a representative product (RP) is to define the average environmental performance of the products (of the product category) sold in the EU market.Based on the RP model, the most relevant life cycle stages, processes, elementary flows, impact categories, and data quality needs are identified.The RP is further used as a standard or point of reference against which comparisons can be made [6].The concept of the RP is a new one which has been established by the PEF method [2].The RP is either a real product, which reflects the environmental impacts of the entire considered product group or a virtual one (which does not exist in real life) established based on the economic or mass related market average for the considered product category.In both cases, the RP aims at displaying the average environmental impacts of the entire product category.As the representative product is defined based on market shares, the environmental impacts of the RP do not necessarily represent the average environmental impacts of this product category.This is further illustrated by two examples in Table 1.We want to state that this example should merely show that setting up a method to determine the RP is challenging and other aspects besides the market share of products may be taken into account.The product category consists of two products (A and B) with a different market share and a different individual environmental performance (here exemplarily expressed in CO 2 eq.).For both examples, it is assumed that Product A has a small market share (10%), while Product B has a high market share (90%).In Example 1, Product A has a small environmental impact (1 kg CO 2 eq.) and Product B has a high environmental impact (10 kg CO 2 eq.).If the average environmental impact of the product category is calculated as arithmetic average, it would be 5.5 kg CO 2 eq.If it is calculated based on the representative product approach (defined based on the market share), the average environmental impact of the product category would be 9.1 kg CO 2 eq.It can be seen that the average environmental impact is much higher when it is calculated based on market shares.The benchmark is dominated by Product B, which has a high market share and high environmental impacts.In Example 2, it is assumed that Product A has a high environmental impact (10 kg CO 2 eq), while Product B has a small environmental impact (1 kg CO 2 eq).If the average environmental impact is calculated based on the RP approach, it would be 1.9 kg CO 2 eq. and therefore lower compared to the impact based on arithmetic average (which is 5.5).Comparing these results, the average environmental impacts of the RP differ.Therefore, based on how the RP is set up, the determined benchmarks differ and consequently the incentives and steering effects they have for the market.Another challenge regarding the current definition of the RP is that certain assumptions are made in its bill of materials (BoM) that can lead to over-and/or underestimations of environmental impacts.When e.g., additives are considered by their maximum dose allowed as done in the beer pilot, overestimations in the RP can occur.This would lead to the fact, that real products would perform better, because most of them do not use the maximum dose allowed.Underestimations in the RP can occur, when not all substances/materials that have environmental impacts are included in the BoM.This leads to the fact that real products would perform worse, because they utilize certain substances/materials.In both cases, the established benchmark (based on the RP) cannot be used to compare the performance of real products of the considered product category.
Due to these two challenges, real products might perform better than the benchmark.In fact, it was often observed in the supporting studies (i.e., studies with real products carried out in the pilot phase) that most of the analyzed products perform much better than the benchmark.The question is, if this really means that the products analyzed in the pilot studies are in fact 'greener' than the market average or if this rather implies that the defined RP is not that representative and cannot guarantee fair comparison and comparative assertions.
With setting up a RP very strict rules are established with regard to how the benchmark (class C) is defined.It was shown that based on how the RP is calculated (based on the market share or as arithmetic average) as well as how the BoM is defined, the benchmark result can vary significantly and therefore also the steering effect it has on the market.
As the definition of the RP depends on the application of PEF, we recommend the following: for internal application, the use of a RP is not needed.For B2B and B2C communication it is necessary to allow for a clear differentiation of products.As shown above to achieve the desired effect for the market (promotion of products with less environmental impacts and therefore, incentives for companies to improve their products) it seems as determining the representative environmental impacts based on the actual average might be more suited, at least when the market is dominated by products with rather high environmental impacts (as shown in Table 1).On the other hand, for tracking the overall impacts of the European market over time, the market-average seems more appropriate.Thus, the RP approach needs to be revised.Currently, no ready-to-go solution is available and further research is needed.Therefore, the EC's effort to find more adequate solutions in the transitioning phase [34] is appreciated.Alternatives to the concept of setting a benchmark could be discussed.Instead of using the RP for setting the benchmark, the best performing product on the market could be used as the benchmark.That would serve as the strongest incentive to improve the environmental performance of products on the market.

Modeling of Electricity
The PEFCR guide clearly defines which electricity mix shall be used for modeling [6].If possible, the electricity mix shall always be modeled by using the company specific mix.If the company mix is not available, the mix of the electricity supplier shall be used.If this information is not available, the country specific mix shall be applied.The use of the EU mix is not foreseen [6].Within ISO, no clear rules are provided regarding the electricity mix used in the model, just that the choice has to fit the scope [7].In most PEFCRs, country specific electricity mixes are used, because it is believed these reflect the real impacts of a particular product as closely as possible.
The reduction of flexibility which electricity mix to use does not necessarily increase comparability of products.For example: for a product/material produced in Poland the Polish electricity consumption mix is applied, whereas the French electricity mix is used for products/materials produced in France.Thus, companies in countries with coal-based electricity mixes like Poland always perform worse (e.g., in the impact category climate change) than companies in countries using nuclear power like France.However, just because a company is located in a specific country does not necessarily mean they use the country's electricity mix.For companies in the foreground system, where the electricity mix is known, there is no need to apply the country mix, as also stated by the PEF method.However, the electricity mix of companies along the supply chain and life cycle of the product is often not known due to missing data.Thus, it would rather be to the company's disadvantage to assume they use the country's electricity mix.
For example, a company in Poland which puts efforts in improving the environmental performance would be disadvantaged compared to a company in France, because of the electricity mix which is outside the control of the company carrying out the study.Similarly, companies located in countries where, e.g., renewable energy is subsidized would be advantaged.By using the European electricity mix for all companies, for which the electricity mix is not known, these disadvantages and advantages would not occur, because for all companies the same electricity mix is applied.
Therefore, we recommend using the purchased electricity mix of the company carrying out the study for modeling.For companies within the supply chain, for which the company carrying out the study does not know the purchased electricity mix, the EU mix should be applied to allow for fair comparisons and comparative assertions.Furthermore, for internal application the PEFCR guide [6] can be followed and data for company or country specific electricity mixes can be applied.When B2C communication is the intended application, the EU electricity mix or the actually purchased mix by the company should be used.For B2B communication (based on a detailed PEF report) the PEFCR guide can be followed and data for company specific electricity mixes can be used, because it is transparently communicated which electricity mix is modeled.
Furthermore, if a country-specific electricity modeling is foreseen for certain applications, all other unit processes and data have to be based on country-specific data as well-i.e., country specific material production data, packaging data, use data, and end-of-life data-for consistency reasons.None of the pilots followed a consistent approach in this regard.

Use of Secondary Data
Each PEFCR has to specify for which processes primary (i.e., specific) or secondary (i.e., average or proxy) data have to be used.Within existing PCRs the need for primary specific data is also defined as well as for which processes/materials secondary data can be applied.If secondary data are used for certain processes, e.g., for material acquisition, differences of these processes cannot reflected in the modeling of the products.That means, that producers buying materials with high environmental impacts would be rewarded, because the real environmental impacts of their materials are not taken into account if average data instead of their worst specific data is used.Conversely, producers buying materials with low impacts are at a disadvantage, because the real environmental impacts of their materials cannot be accounted for.For example, if a material is modeled with specific data reflecting, e.g., best available technologies (BAT), the environmental impacts are significantly lower than if the average import mix of the material is used.Thus, by decreasing the flexibility regarding the use of primary data, fair comparability is no longer guaranteed.Thus, we recommend the following procedure: for internal application, the use of secondary data for certain materials and processes does not present a challenge, because it is assumed that the company defines primary data when necessary to identify the full optimization potential of their product/process.For B2B and B2C communication, modeling with default secondary data is challenging particularly when secondary data is used for materials/processes, which are identified as relevant (as currently done in several PEFCRs).Thus, we recommend that more specific data is provided by the EC for the relevant processes within the product life cycles or that all datasets are designed in a way that they can be specifically adapted to reflect company-specific aspects.

Modeling of EoL Allocation with the Circular Footprint Formula
Within the PEF method [2] the EoL formula, published in 2013, was introduced to standardize the allocation of burdens and benefits in the EoL stage and thus to enhance comparability of different product systems.It deals with multi-functionality in recycling, re-use, and energy recovery situations.It considers the burdens of virgin material acquisition and pre-processing, of the recycled material input, of the recycling (or re-use) process, of the energy recovery process as well as of the disposal.Overall, 17 parameters are considered, describing e.g., quality of primary and secondary material, lower heating value, recycling fraction of material.All these parameters as well as the entire EoL formula are explained in detail in the PEF guide [2].The formula led to many discussions due to its obvious shortcomings (e.g., promotion of energy recovery of materials) [6,34].Towards the end of the pilot phase, a new formula-the circular footprint formula (CFF)-was introduced [6].It considers the production burdens, the burdens and benefits related to secondary material inputs as well as outputs, burdens and benefits of energy recovery as well as of disposal.The CFF also considers 17 different parameters, which differ from the original EoL formula, e.g., the A factor, which allocates burdens and credits between two life cycles is introduced.The CFF is explained in detail in the current PEFCR guide [6].An advantage of this formula is that it no longer arbitrarily favors incineration of materials over reuse and recycling.However, several challenges with regard to ensuring fair comparability exist also for the CFF.Within ISO, no formula or approach is defined to model the EoL stage.By decreasing the flexibility with regard to modelling the EoL phase and providing a formula, which has to be applied within every product model, PEF actually reduces fair comparability instead of increasing it.The following shortcomings demonstrate this:

•
How often a material is recycled is not considered.Hence, a material that is recycled once gets the same burden/credit as a material that is recycled several times.Only for packaging material is an exception made as the burden of the virgin material is divided by the number of recycling cycles.

•
The default data provided for the quality term (quality of primary material related to quality of secondary material) is not adequate: for all metals, the same value of 1 is assigned, even though secondary metal qualities differ significantly.Furthermore, for most plastic materials (high-density polyethylene, polypropylene, polyethylene terephthalate; except of low-density polyethylene film) a value of 1 is assigned as well, even though plastic is usually down-cycled, meaning that secondary material usually does not reach the same quality as the virgin material.For some materials (e.g., paper, plastic) the quality of the secondary material even depends on the original use.For example: for paper only one quality term is provided, but printing paper is usually recycled to printing paper with equal quality, which is not possible for glossy paper of magazines.

•
The newly introduced parameter A ranges between 0.2 and 0.8 and is supposed to reflect the current market situation (e.g., 0.2: low supply/high demand; 0.8: high supply/low demand).Thus, the CFF is in conflict with ISO 14044 for closed loop systems, because only 80% of the credits can be given with CFF, whereas ISO allows that 100% of the credits are allocated to the product system.
Even though the company is more aware of the value choices made in the CFF, we recommend that the CFF is applied with caution.For B2B and B2C communication, we recommend not using the CFF due to its many bias assumptions.Therefore, we recommend to review and revise the quality terms and allocation factors and to consider reuse rates for all materials and products.One simple solution to improve the CFF would be to adapt the specifications of ISO for closed loop recycling.

Applicability and Reliability of Life Cycle Impact Assessment Methods
The PEF method predefines 17 LCIA methods.Within ISO 14040/44 no methods are required to be applied.Existing PCRs often set a minimum requirement of categories to be considered and sometimes also predefine LCIA methods to be used.However, it is allowed to apply a variety of methods when requested by the user of the PCR.By decreasing the flexibility of the users to choose their own categories and methods, fair comparability is rather decreased than increased as demonstrated in the following.

Implicit Weighting of Impact Categories
The granularity of the impact categories is inconsistent and therefore leads to implicit weighting.For the evaluation of land use, only one impact category is considered with five sub-categories (or five indicators), whereas impacts of eutrophication are addressed in three categories (eutrophication, marine; eutrophication, freshwater; eutrophication, terrestrial).Implicit weighting may also distort the normalized and weighted results, because existing weighting schemes provide three weighting factors for eutrophication, whereas land use is only considered once.Therefore, we recommend reconsidering the current structure of the impact categories of eutrophication, acidification, resources, and toxicity and to cluster them the same way it is proposed for land use to increase consistency and avoid implicit weighting.

Availability of Inventory Data
As most life cycles are global, the applied LCIA methods should also have a global perspective.This means that regionally derived characterization factors (CFs) should be used when CFs are available for most regions worldwide and regionalized inventory data can be applied.If this is not the case as currently for most of the considered impact categories, global average CFs should be used instead of European average or country specific CFs.The importance of the use of such global average CFs is illustrated in the following two examples: (1) The AWARE method [16] for the category water scarcity provides CFs for several countries and has to be applied with regionalized inventory data.Within the pilot phase, the pilots applied European average water scarcity factor for all inventory data (even for processes taking place outside of Europe).This is contrary to the basic idea of water scarcity approaches, namely to identify those production locations where water consumptions lead to the highest impacts [35][36][37].
By determining how impacts of water depletion have to be assessed, the PEF reduces the flexibility of ISO 14046.By not using regionalized CFs and inventory data, fair comparability is not guaranteed.Thus, we recommend applying regionalized inventory data and regionalized CFs for the category water depletion.Not for all secondary data, regionalized inventory data are available yet [38,39].However, we recommend that within upcoming calls for data by the EC, the implementation of regionalized water inventory flows should be a requirement.(2) The CFs of the categories acidification and eutrophication are based on European fate models [40][41][42] and thus are only valid for evaluating European based processes.However, within the pilot phase the CFs were applied to assess all processes (inside and outside of Europe).As eutrophying and acidifying emissions are regulated differently worldwide (e.g., strictly within Europe since the 80-ies [43] and hardly or not in emerging countries like China [44]) such an application of European based regionalized CFs is not adequate.As the CFs of the predefined LCIA methods are based on European specific fate models, the amount of emissions emitted into the environment in Europe is considered.Based on the emitted amount the CFs vary as shown by Seppälä et al. [42].By reducing the flexibility to apply LCIA methods appropriate for the available inventory data, fair comparability of product systems is reduced.Thus, we recommend to apply regionalized inventory data for regionalized methods, at least for the relevant processes.If these data are not available for some methods like acidification and eutrophication, we recommend to apply global LCIA methods.
Furthermore, inventory data of the category particulate matter (PM) are often not reliable due to difficulties in measuring PM2. 5 [45,46].The newly developed LCIA method for PM [17] covering PM2.5 emissions is classified by PEF as with a Level 1 maturity, thus considered as very robust.However, as inventory data for PM2.5 is missing, the determined results cannot be considered as reliable as the Level 1 classification implies.Classifying the method for PM as Level I means that it gets a higher weight in the weighting system (according to the approach by the EC [28]).Thus, we recommend to decrease the rating for this category to better reflect the uncertainties of the PM2.5 inventory data in the current weighting set.

Arbitrary Exclusion of Impact Categories
During the pilot phase, the EC decided that all three toxicity categories (human toxicity, cancer; human toxicity, non-cancer; ecotoxicity, freshwater) are excluded from any communication and cannot be chosen as relevant impact categories due to high uncertainties in the results.However, their exclusion can lead to burden shifting, i.e., when the impact categories, which are communicated, are improved at the expense of the excluded ones.Thus, by decreasing the flexibility to choose these methods as relevant ones, fair comparability is decreased.We therefore recommend considering the toxicity categories when determining relevant life cycle stages, processes, and elementary flows.
Overall, we recommend the following procedure: As long as the company applying PEF is aware of the addressed challenges PEF can be used for internal application.This is different for B2B and especially for B2C communication, because fair comparability cannot be guaranteed due to the addressed challenges.Therefore, our recommendation for improvement throughout this section should be put in practice during the PEF transitioning phase Furthermore, as clearly stated by ISO 14040/44 [5] in general, comparative assertions shall not be based on LCIA results only.Assuming all addressed shortcomings are solved satisfactory, the determined results based on the PEF method are only reflecting part of the environmental impacts of a product, because other impacts-like loss of biodiversity, noise, animal welfare, etc.-are not taken into account.Thus, we recommend integrating additional aspects when applicable methods are available.Moreover, as long as relevant aspects are not addressed-e.g., animal welfare-these shortcomings have to be communicated to the user, especially when B2C communication is the intended application of PEF.

Prioritization of Impact Categories by Normalization and Weighting
In PEF, normalization and weighting have to be applied to determine the most relevant impact categories as well as life cycle stages, processes, and elementary flows.The underlying concept of normalization is to set product system specific emissions in relation with emissions of a reference region (expressed by normalization factors-NF).For PEF NFs, a global reference region was chosen.Normalization can help to provide and communicate the relative magnitude of indicator results and is an optional step according to ISO 14040/44.Weighting is carried out to rate and possibly aggregate normalized indicator results using numerical values and is one optional step in ISO 14040/44 as well [5].For PEF, an own weighting scheme was set up [28].Regardless of the proposed weighting factors, it is important to understand, that there is no 'perfect' or 'science-based' weighting set.Weighting as such is always a value choice and thus represents the subjective understanding of (one or several) stakeholder.Thus, by reducing the flexibility to determine relevant impact categories based on expert judgments (as done in the last decades), fair comparability is no longer guaranteed as the following explanations will demonstrate.

Normalization
As normalization is a relative approach, normalized LCIA results are low when the reference values (e.g., global emissions) are high, whereas they are high when emissions in the reference region are small.This means that a specific amount of emissions of a product system is considered as more relevant when the overall emissions in the reference region are low and less important in regions where already high background emissions exist.To demonstrate the influence of normalization, the indicator results of the impact category acidification of an exemplary pilot project are normalized with different normalization factors (NF): the original NFs provided by the Joint Research Center (JRC) and two modified NFs representing scenarios in which two-and four-times higher and lower global emissions are assumed, respectively (see Figure 2).It can be seen that the normalized results get smaller the higher the NF is, i.e., the higher the overall global emissions are.In parallel, it can be seen that the normalized results get higher the smaller the NF is, i.e., the smaller the global emissions are.Normalization therefore implies that emitted emissions (or resource use) associated with a product system are considered less relevant when high emissions (or resource use) are already present in a region and more relevant when only low emissions (or resource use) exist.
However, the reversed argument could also be plausible: releasing product system specific emissions (or resource use) in a region where the overall emissions (or resource use) are already high can be considered to be more relevant than releasing them in a region with a low amount of emissions (or resource use).This is the notion of most political strategies, e.g., due to the already high emissions of particulate matter, measures are developed and implemented for its reduction [47].The truth lies somewhere in the middle: product specific emissions (or resource use) can be relevant both in regions where the overall emissions (or resource use) are already high or still low.
The second challenge is that data for calculating the normalization factors are often incomplete.For example, for the toxicity categories tens of thousands of substances have theoretically to be considered, but data for far less than 1000 actually exist [48].Hence, normalization factors are typically based on a small amount of substances only, i.e., are set too low.

Weighting
Normalization always has to be applied together with adequate weighting factors to determine relevant impact categories.Over the course of the PEF pilot phase, the JRC developed a weighting set to be used for PEF studies [28].The challenge is that, after normalization, the results of the various impact categories differ in up to two decimal powers, but the difference in the current weighting factors for the different categories is only up to four times [6,12,28].Hence, the relevance of the impact categories is mainly determined by the normalization results.As shown in Figure 3, the four categories with the highest normalized result are also the categories with the highest normalized and weighted result.Establishing a representative and accepted weighting method agreed on by several stakeholders is an ambitious goal, which could not be achieved so far.Weighting is a political issue, not a scientific one, and as such should be disconnected from the scientific assessment method.It can be seen that the normalized results get smaller the higher the NF is, i.e., the higher the overall global emissions are.In parallel, it can be seen that the normalized results get higher the smaller the NF is, i.e., the smaller the global emissions are.Normalization therefore implies that emitted emissions (or resource use) associated with a product system are considered less relevant when high emissions (or resource use) are already present in a region and more relevant when only low emissions (or resource use) exist.
However, the reversed argument could also be plausible: releasing product system specific emissions (or resource use) in a region where the overall emissions (or resource use) are already high can be considered to be more relevant than releasing them in a region with a low amount of emissions (or resource use).This is the notion of most political strategies, e.g., due to the already high emissions of particulate matter, measures are developed and implemented for its reduction [47].The truth lies somewhere in the middle: product specific emissions (or resource use) can be relevant both in regions where the overall emissions (or resource use) are already high or still low.
The second challenge is that data for calculating the normalization factors are often incomplete.For example, for the toxicity categories tens of thousands of substances have theoretically to be considered, but data for far less than 1000 actually exist [48].Hence, normalization factors are typically based on a small amount of substances only, i.e., are set too low.

Weighting
Normalization always has to be applied together with adequate weighting factors to determine relevant impact categories.Over the course of the PEF pilot phase, the JRC developed a weighting set to be used for PEF studies [28].The challenge is that, after normalization, the results of the various impact categories differ in up to two decimal powers, but the difference in the current weighting factors for the different categories is only up to four times [6,12,28].Hence, the relevance of the impact categories is mainly determined by the normalization results.As shown in Figure 3, the four categories with the highest normalized result are also the categories with the highest normalized and weighted result.Establishing a representative and accepted weighting method agreed on by several stakeholders is an ambitious goal, which could not be achieved so far.Weighting is a political issue, not a scientific one, and as such should be disconnected from the scientific assessment method.Moreover, according to the PEFCR guide impact categories that cumulatively contribute to at least 80% of the total environmental impact of the RP are considered as relevant, whereas the remaining categories are classified as not relevant.As shown in Figure 3 for the chosen example, differences in the impact category results of 'land use' and 'resource use, fossil' are very small (only 8%).Is the distinction of the results too small as in this case, such a classification into 'relevant' and 'not relevant' is not adequate as 'relevant' life cycle stages, processes and elementary flows are determined only for the chosen impact categories.That means (1) that relevant life cycle stages, processes and elementary flows are only determined for categories, which may not be the most relevant ones for the considered product system; and (2) that for most of the impact categories the relevant life cycle stages, processes and elementary flows are not even analysed.This is important especially when determining relevant life cycle stages, processes and elementary flows (as defined within the PEF guide [2]) and when specific measures like alternative materials are considered, to make sure that trade-offs are taken into account.Within the chosen example, for instance, only 5 out of 13 categories are taken into account and therefore trade-offs within the neglected categories are likely.
Overall, we recommend the following procedure: All 17 categories should be considered for every PEF case study.Which of the results are finally communicated to the consumer is independent of the analysis of the product system and should be discussed separately, meaning the scientific procedure how to determine relevancy shall not be influenced by the question what the consumer is able to understand.
Depending on the implementation of PEF, normalization and/or weighting might not even be necessary.For internal application, all categories have to be considered (according to PEF rules), which renders normalization and/or weighting redundant.For B2B communication, all impact Moreover, according to the PEFCR guide impact categories that cumulatively contribute to at least 80% of the total environmental impact of the RP are considered as relevant, whereas the remaining categories are classified as not relevant.As shown in Figure 3 for the chosen example, differences in the impact category results of 'land use' and 'resource use, fossil' are very small (only 8%).Is the distinction of the results too small as in this case, such a classification into 'relevant' and 'not relevant' is not adequate as 'relevant' life cycle stages, processes and elementary flows are determined only for the chosen impact categories.That means (1) that relevant life cycle stages, processes and elementary flows are only determined for categories, which may not be the most relevant ones for the considered product system; and (2) that for most of the impact categories the relevant life cycle stages, processes and elementary flows are not even analysed.This is important especially when determining relevant life cycle stages, processes and elementary flows (as defined within the PEF guide [2]) and when specific measures like alternative materials are considered, to make sure that trade-offs are taken into account.Within the chosen example, for instance, only 5 out of 13 categories are taken into account and therefore trade-offs within the neglected categories are likely.
Overall, we recommend the following procedure: All 17 categories should be considered for every PEF case study.Which of the results are finally communicated to the consumer is independent of the analysis of the product system and should be discussed separately, meaning the scientific procedure how to determine relevancy shall not be influenced by the question what the consumer is able to understand.
Depending on the implementation of PEF, normalization and/or weighting might not even be necessary.For internal application, all categories have to be considered (according to PEF rules), which renders normalization and/or weighting redundant.For B2B communication, all impact categories are included in the PEF report (according to the PEFCR guide [6]).Thus, there is no need for normalization and/or weighting.There are several possibilities for how PEF can be used for B2C communication, which also determines if normalization and/or weighting are necessary.PEF can be implemented as an independent environmental label based on a single-score result by aggregating the results of the individual categories.This can be done with normalization (based on global emissions) and weighting as currently required by PEF, but also by applying other normalization methods, e.g., based on carrying capacity [49] or by applying other aggregation methods such as the distance-to-target approach [50].The decision of which categories are important should be made later in a political process by the decision makers.Thus, weighting should be addressed during policy implementation, as this will define the need for it depending on the specific application.This will also determine which stakeholders should have a say in the weighting and what an inclusive, representative process to come up with weighting factors should look like.When PEF is implemented as an independent environmental label based on relevant impact categories, but without aggregation, only weighting is required to identify the relevant categories.When PEF is implemented as an evaluation method for existing labels (e.g., by redefining criteria), a single-score result is not necessary and even counterproductive in this case, because the detailed non-aggregated results are needed.The relevant categories have already been identified during the establishment of the label.These categories however could be extended based on the weighting developed for PEF.For that, a weighting scheme developed by policy makers is necessary, but no normalization is needed.
Furthermore, instead of considering only categories that contribute to 80% of the overall impacts, we recommend including all categories or at least with a contribution to 90% (or even higher).Another option could be to set a minimum threshold-e.g., all impact categories which contribute more than 5% should be taken into account.Furthermore, we recommend that relevant processes should not be determined simply on their contribution to the overall environmental impact, but also other aspects such as influence of company carrying out the study to reduce the impacts, corporate annual accounts, or expert judgments.
Concluding this section, in Table 2 an overview of all identified challenges, their implication for comparability, and our recommendations are provided.

Challenges Implications for Comparability Recommendations
Modeling of End-of-Life: (i) It is not considered how often a material is recycled (ii) Quality of recycled materials is not addressed adequately (iii) For closed-loop systems only 80% of the credits as maximum can be given Environmental impacts of many materials and efforts of companies to establish closed-loop-systems are not accounted for properly Refine the formula, so that the recycling cycles and the quality of the recycled material are considered 1 For these challenges, no ready-to-go solutions are available and further research is needed to tackle them; 2 These challenges will be addressed in the PEF transitioning phase.

Conclusions
We want to stress that in general, LCA-and hence also PEF-is a useful tool to analyze potential environmental impacts of products, e.g., for in-house product improvement or external B2B or B2C communication.By identifying potential environmental impacts (including trade-offs) and showing directions, LCA/PEF can support/guide decisions towards "green products" and a "circular economy" [51][52][53].Thus, we support the PEF approach to address all relevant impact categories and the full life cycle of products as well as the proposal of further guidance for a method for quantifying and communicating environmental performance from cradle-to-grave.Moreover, we appreciate the work done by the numerous stakeholders which were and are involved in the PEF pilot phase and the transition phase.However, a number of challenges related to the PEF method and the developed PEFCRs still exist.Only few of the early identified challenges (as shown by e.g., [4]) were addressed in the pilot phase, but most challenges still exist and have not been tackled successfully in the pilot phase.
With regard to the claim of "comparability over flexibility", it was shown that neither comparability can be achieved, nor that a decrease of flexibility is adequately implemented.PEF reduces the flexibility of the user significantly, which led to higher reproducibility of results.If the variability of the real world is expressed in models considering certain value choices (e.g., using the same dataset for a raw material, which in real life can be produced in different ways such as agricultural products), which every study has to apply, reproducibility instead of comparability is promoted and inflexibility is increased.Thus, reproducibility does not automatically introduce comparability, but rather bias.
This paper provides an evaluation of the main challenges and concerns we see with regard to a potential application of PEF and the PEFCR for comparisons and comparative assertions.First ideas and recommendations on how these challenges could be tackled are provided.
To determine the applications of PEF is extremely important as methodological requirements for a PEF study depend largely on the goal of the study, usually defined within the first step.As explained in this paper, most of the requirements defined in the PEF method and the established PEFCRs are adequate for application for internal product and process optimization.They are however not suitable for fair comparability.Addressing the challenges outlined in this paper and considering the recommendations provided would help to position PEF as a useful instrument which can facilitate the use of LCA and provide the basis for sound political decisions.
Author Contributions: V.B. and A.L. are both the leading composers of this manuscript and both contributed substantially to the structure and text of the manuscript.All authors contributed to the content, i.e., to the evaluation of the PEF-pilot phase.All authors proofread and approved the final manuscript.
Funding: This research was funded by the German Environmental Agency (Umweltbundesamt) as part of the environmental research plan-project code number 3712 95 337-and financed with federal funding.

Figure 2 .
Figure 2. Influence of normalization factors (NF) on normalized results demonstrated for the impact category acidification of an exemplary pilot project (fully filled grey bars: normalized results when original NF is applied; fully filled black bars: normalized results when original BF is decreased; striped bars: normalized results when original NFs are increased) (own diagram).

Figure 2 .
Figure 2. Influence of normalization factors (NF) on normalized results demonstrated for the impact category acidification of an exemplary pilot project (fully filled grey bars: normalized results when original NF is applied; fully filled black bars: normalized results when original BF is decreased; striped bars: normalized results when original NFs are increased) (own diagram).

Figure 3 .
Figure 3. Normalized and normalized and weighted results for all impact categories determined with the newly proposed weighting set, separated into relevant and not relevant categories (according to the PEFCR guide) (own diagram).

Figure 3 .
Figure 3. Normalized and normalized and weighted results for all impact categories determined with the newly proposed weighting set, separated into relevant and not relevant categories (according to the PEFCR guide) (own diagram).

Table 1 .
Average environmental performance of Products A and B, calculated based on the market share (representative product) and as arithmetic average.

Table 2 .
Overview of identified challenges, their implication for comparability, and recommendations from the authors.