Webinar – Data Quality In context of EF pilot phase of European Commission DG-ENV 18 July 2014
Jul 12, 2015
Webinar – Data Quality In context of EF pilot phase of European Commission DG-ENV 18 July 2014
Pilot phase is a work in progress
• One of the pilot’s objectives is to set up and validate the process of the development of Product Environmental Footprint Category Rules and Organisation Environmental Footprint Sector Rules – PEFCRs & OEFSRs, including the development of performance benchmarks
• Lessons learned will be documented and taken into account for further improvement of the PEF/OEF guide
• This webinar aims to support you in applying the PEF/OEF guide and Guidance document v4.0 but is also a platform for discussing bottlenecks and lessons learned
2
Development process of PEFCR/OEFSR
3
Final PEFCR/OEFSR
Confirmation of benchmark(s) and determination of performance classes
PEFCR/OEFSR supporting studies
Draft PEFCR/OEFSR
PEF/OEF screening
Define product “model” based on representative product
Define PEF/OEF product category
Focus of this webinar is data quality
Qualitative data quality assessment
Specify data quality requirements (+ additions)
Semi-quantitative data quality assessment
Refine data quality requirements (+ additions)
Webinar outline
• What are data quality requirements? • Why do you need data quality requirements? • How do you assess data quality? • Requirements for data quality • Practical examples • Guidance for data quality assessment
What & Why How Examples Requirements Guidance
4
What are Data Quality Requirements?
• Set of criteria for the representativeness and completeness of the data
• Only applies to Resource Use and Emissions Profile Data, not the EF impact assessment
• Applies to both specific and generic data
What & Why How Examples Requirements Guidance
5
Why do you need Data Quality Requirements?
• To determine: – To what degree the Resource Use and Emissions Profile of these
processes and products covers all the emissions and resources of these processes and products (completeness).
– To what degree the processes and products selected depict the system that is analyzed (representativeness).
Completeness Representativeness
What & Why How Examples Requirements Guidance
6
How do you assess Data Quality?
• Semi-quantitative: six data quality criteria 1. Use the semi-quantitative assessment to assess a process on each of
the data quality criteria 2. Calculate the data quality per dataset/process based on the six scores
• Qualitative: expert judgement
What & Why How
Examples Requirements Guidance
7
Data quality criteria
• Six quality criteria are adopted – Five relating to the data – One relating to the methodology
Data quality criteria
1. Technological representativeness 2. Geographical representativeness 3. Time-related representativeness 4. Completeness 5. Parameter uncertainty 6. Methodological Appropriateness and Consistency*
* Only until end of 2015
What & Why How
Examples Requirements Guidance
8
Semi-quantitative data quality assessment
• Each criterion is assessed to determine the quality level
• Five quality levels are defined: very good (1) to very poor (5)*
• Three criteria do not have predefined requirements: – Technological representativeness – Geographical representativeness – Time-related representativeness
• These are context specific and need to be defined in the PEFCR/OEFSR
* Table 5 in the PEF/OEF method
What & Why How
Examples Requirements Guidance
9
1. Technological representativeness
• Degree to which the dataset reflects the true population of interest regarding technology, including for included background datasets, if any.
• For example the technological characteristics, including operating conditions.
• Ideal situation vs. current situation. Technological representativeness
Quality level Quality rating Definition
1 Very good Context specific
2 Good Context specific
3 Fair Context specific
4 Poor Context specific
5 Very poor Context specific
ideal
What & Why How
Examples Requirements Guidance
10
2. Geographical representativeness
• Degree to which the dataset reflects the true population of interest regarding geography, including for included background datasets, if any.
• For example the given location/site, region, country, market, continent, etc.
• Ideal situation vs. current situation. Geographical representativeness
Quality level Quality rating Definition
1 Very good Context specific
2 Good Context specific
3 Fair Context specific
4 Poor Context specific
5 Very poor Context specific
ideal
What & Why How
Examples Requirements Guidance
11
3. Time-related representativeness
• Degree to which the dataset reflects the specific conditions of the system being considered regarding the time/age of the data, including for included background datasets, if any.
• For example of the given year (and, if applicable, of intra-annual or intra-daily differences).
• Ideal situation vs. current situation. Time-related representativeness
Quality level Quality rating Definition
1 Very good Context specific
2 Good Context specific
3 Fair Context specific
4 Poor Context specific
5 Very poor Context specific
ideal
What & Why How
Examples Requirements Guidance
12
4. Completeness
• To be judged with respect to the coverage for each EF impact category and in comparison to a hypothetical ideal data quality.
• Ideal situation vs. current situation.
Completeness
Quality level Quality rating Definition
1 Very good ≥ 90%
2 Good 80% to 90%
3 Fair 70% to 80%
4 Poor 50% to 70%
5 Very poor <50%
ideal
What & Why How
Examples Requirements Guidance
13
5. Parameter uncertainty
• Qualitative expert judgement or relative standard deviation as a % if a Monte Carlo simulation is used.
• Only related to the resource use and emissions profile data, not the EF impact assessment.
• Ideal situation vs. current situation.
Parameter uncertainty
Quality level Quality rating Definition
1 Very good Very low uncertainty (≤ 10%)
2 Good Low uncertainty (10% to 20%)
3 Fair Fair uncertainty (20% to 30%)
4 Poor High uncertainty (30% to 50%)
5 Very poor Very high uncertainty (> 50%)
ideal
What & Why How
Examples Requirements Guidance
14
6. Methodological appropriateness and consistency
• Assess if: – The applied LCI methods and methodological choices are in line with
the goal and scope of the dataset, especially its intended applications as support to decisions.
– The methods have been applied consistently across all data.
• Ideal situation vs. current situation.
• Only applicable until the end of 2015. From 2016, full compliance with the PEF methodology will be required.
What & Why How
Examples Requirements Guidance
15
6. Methodological appropriateness and consistency Methodological appropriateness and consistency
Quality level Quality rating Definition
1 Very good Full compliance with all requirements of the PEF guide
2 Good Attributional process based approach AND following three method requirements of the PEF guide met: - Dealing with multi-functionality - End of life modelling - System boundary
3 Fair Attributional process based approach AND two of the following three method requirements of the PEF guide met: - Dealing with multi-functionality - End of life modelling - System boundary
4 Poor Attributional process based approach AND one of the following three method requirements of the PEF guide met: - Dealing with multi-functionality - End of life modelling - System boundary
5 Very poor Attributional process based approach BUT none of the following three method requirements of the PEF guide met: - Dealing with multi-functionality - End of life modelling - System boundary
16
ideal
What & Why How
Examples Requirements Guidance
Data Quality per dataset
• The overall data quality of a dataset is the average of the score obtained for all six data quality criteria
6MPCTiRGRTeRDQR +++++
=
DQR: Data Quality Rating of the data set; TeR: Technological Representativeness GR: Geographical Representativeness TiR: Time-related Representativeness C: Completeness; P: Precision/uncertainty; M: Methodological appropriateness and consistency
What & Why How
Examples Requirements Guidance
17
Data Quality per dataset
• The overall data quality rating corresponds to a data quality level
Overall data quality rating (DQR) Overall data quality level
≤ 1.6 Excellent quality
>1.6 to≤ 2.0 Very good quality
>2.0 to ≤ 3.0 Good quality
>3 to ≤ 4.0 Fair quality
>4 Poor quality
* Table 6 in the PEF/OEF method
What & Why How
Examples Requirements Guidance
18
Additional aspects of data quality
• Three more aspects are included in the quality assessment 1. Documentation
• Compliant with ILCD format* 2. Nomenclature
• Compliant with ILCD nomenclature* 3. Review
• By qualified reviewer • Separate review report
*http://eplca.jrc.ec.europa.eu/?page_id=134
What & Why How
Examples Requirements Guidance
19
Requirements for data quality
1. For the PEF/OEF screening – a minimum “fair” quality data rating is required for data contributing
to at least 90% of the impact estimated for each EF impact category – assessed via a qualitative expert judgement
2. In PEFCR/OEFSR
– PEFCRs/OEFSRs shall provide further guidance on data quality assessment scoring for the product category with respect to time, geographical and technological representativeness.
– PEFCRs/OEFSRs may specify additional criteria for the assessment of data quality (compared to default criteria).
20 What & Why How Examples
Requirements Guidance
Requirements for data quality
– The PEFCR may specify more stringent data quality requirements regarding e.g.:
• Gate-to-gate activities/processes • Upstream or downstream phases • Key supply chain activities for the product category • Key EF impact categories for the product category
– The OEFSR may specify more stringent data quality requirements
regarding e.g.: • Foreground processes • Background processes (both upstream and downstream stages) • Key supply chain processes/activities for the sector • Key EF impact categories for the sector
21 What & Why How Examples
Requirements Guidance
Requirements for data quality
3. For PEFCR/OEFSR supporting studies
– Plus possibly more stringent data quality requirements as specified in the
PEFCR/OEFSR
Minimum data quality required
Type of required data quality assessment
Data covering at least 70 % of contributions to each EF impact category
Overall “Good” data quality (DQR ≤ 3,0)
Semi-quantitative
Data accounting for 20-30 % of contributions to each EF impact category
Overall “Fair” data quality
Qualitative expert judgement. No quantification required.
Data used for approximation and filling identified gaps (≤10 % of the contribution to each EF impact category)
Best available data Qualitative expert judgement. No quantification required.
22
* Table 4 in the PEF/OEF method
What & Why How Examples Requirements
Guidance
Example
• Dyeing process data quality
Quality level
Quality rating
Technological representativeness Geographical representativeness Time-related representativeness
1 Very good
Discontinuous with airflow dyeing machines
Central Europe mix 2009 – 2012
2 Good Consumption mix in EU (30% semi-continuous, 50% exhaust dyeing and 20% continuous dyeing)
EU 27 mix or UK, DE, IT, FR 2006 – 2008
3 Fair Production mix in EU (35% semi-continuous, 40% exhaust dyeing and 25% continuous dyeing)
Scandinavian Europe; other EU 27 countries
1999 – 2005
4 Poor Exhaust dyeing Middle east or US, JP 1990 – 1998
5 Very poor
Continous dyeing/other/unknown Other/unknown < 1990 or unknown
ideal
What & Why How Examples
Requirements Guidance
23
Example
• Data quality requirements in PEFCR for intermediate paper products*
Quality rating and level Technological rep. Geographical rep. Time-related rep.
1 – Very good E.g. Process is same Country specific data ≤ 3 year old data
2 – Good E.g. average technology as country- specific consumption mix
Central Europe, North Europe, or representative EU 27 mix
3-5 years old data
3 – Fair E.g. average technology as country- specific production mix or average technology as average EU consumption mix
EU-27 countries, other European country
5-10 years old data
4 – Poor E.g. average technology as country- specific consumption mix of a group of similar products
Middle east, North- America, Japan etc.
10-15 years old data
5 – Very poor E.g. other process or unknown
Global data or unknown ≥ 15 years old data
* This is taken from the draft document “Product Footprint Category Rules (PFCR) for Intermediate Paper Products” (2011) by the Confederation of European Paper Industries (CEPI), which was based on a draft version of the PEF Guide 24 What & Why How
Examples Requirements Guidance
Example • Example of data quality requirements in PEFCR for fertilizers*
Quality rating and level Technological rep. Geographical rep. Time-related rep.
1 – Very good E.g. Data from enterprises, processes and materials under study
Data from area under study < 3 year old data
2 – Good Data from processes and materials under study (i.e. identical technology) but from different enterprises
Average data from larger area in which the area under study is included
< 6 years old data
3 – Fair Data from processes and materials under study from different technology
Data from area with similar production conditions
< 10 years old data
4 – Poor Data on related processes or materials
Data from area with slightly similar production conditions
< 15 years old data
5 – Very poor Data on related processes on laboratory scale or from different technology or unknown
Data from unknown or distinctly different area (e.g. Russia instead of Europe)
Age of data unknow or ≥ 15 years old data
* Hypothetical example taken and modified from Ciroth, A., S. Muller, et al. (2013). "Empirically based uncertainty factors for the pedigree matrix in ecoinvent." The International Journal of Life Cycle Assessment: Retrieved from http://dx.doi.org/10.1007/s11367-013-0670-5
What & Why How Examples
Requirements Guidance
25
Data quality criteria Achieved quality level Achieved quality rating
Technological representativeness (TeR) Good 2
Geographical representativeness (GR) Good 2
Time-related representativeness (TiR) Fair 3
Completeness (C) Good 2
Parameter uncertainty (P) Good 2
Methodological apropriateness and consistency (M) Good 2
What & Why How Examples
Requirements Guidance
Example
• Example for determining data quality rating
2.26
222322=
+++++=DQR Overall quality level = Good quality
26
Guidance for assessment of data quality
• The data quality assessment of generic data shall be conducted at the level of the input flows. – Example: purchased paper used in a printing office
• The data quality assessment of specific data shall be
conducted at the level of an individual process or aggregated process, or at the level of individual input flows.
27 What & Why How Examples Requirements
Guidance
Guidance for data quality assessment in PEF/OEF screening 1. Model “business as usual” 2. Rank the data for each EF impact category according to the impact
contribution 3. Identify the data contributing to at least 90% of the contributions to each
EF impact category
Dataset Climate change Heat from natural gas 29.4% Yarn production, from cotton fibres 22.0% Laundry detergent ingredient 14.4% Electricity, MV, European production 6.9% Electricity, MV, Dutch grid 6.4% Electricity, MV, Chinese grid 6.0% Truck transport 5.1%
Dataset Land use Harvested cotton 86.9% Laundry detergent ingredient 4.5%
Dataset Particulate matter formation
Yarn production, from cotton fibres 43.9% Laundry detergent ingredient 16.0% Electricity, MV, Chinese grid 15.4% Harvested cotton 9.6% Electricity, MV, European production 3.7% Truck transport 2.7%
90.3% 91.3%
91.4%
28 What & Why How Examples Requirements
Guidance
Guidance for data quality assessment in PEF/OEF screening 4. Assess the data quality of the datasets identified
a. If each is at least “fair” quality then data quality requirements for the PEF/OEF screening are met
b. If not, i. Refine the data collection to meet the “fair” quality level OR ii. Identify for each EF impact category the following datasets with large contribution (to
complete at least 90% of the impact) and repeat the exercise
Dataset Land use DQR Harvested cotton 86.9% Good Laundry detergent ingredient 4.5% Poor
Dataset Land use DQR Harvested cotton 86.9% Good Heat from natural gas 2.2% Fair Truck transport 1.6% Fair
29
Replace the dataset used for the laundry detergent
ingredient with one of better data quality – at least “fair”
OR
Select other datasets to complete at least 90% of the impact
90.7%
91.4%
What & Why How Examples Requirements Guidance
Marisa Vieira | [email protected] Ellen Brilhuis-Meijer | [email protected] Annemarie Kerkhof | [email protected]