Top Banner
IN DEGREE PROJECT ENVIRONMENTAL ENGINEERING, SECOND CYCLE, 30 CREDITS , STOCKHOLM SWEDEN 2019 Exploring Consumer Expenditure And Environmental Impacts Across European Nations A Data-Mining Approach SJOERD HERLAAR KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ARCHITECTURE AND THE BUILT ENVIRONMENT
90

Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Oct 25, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

IN DEGREE PROJECT ENVIRONMENTAL ENGINEERING,SECOND CYCLE, 30 CREDITS

, STOCKHOLM SWEDEN 2019

Exploring Consumer Expenditure And Environmental Impacts Across European Nations

A Data-Mining Approach

SJOERD HERLAAR

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ARCHITECTURE AND THE BUILT ENVIRONMENT

Page 2: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN
Page 3: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Abstract

As the pressures on the environment created by humanity increase, understanding how

products influence a person’s overall impact becomes more important in order to make

choices about how a person chooses to consume. Recent literature shows that household

consumption is responsible for 51% to 81% of a nation’s total emissions (Ivanova, Stadler,

et al. 2016). This study investigates whether expenditure is a viable proxy measurement

for consumer impacts in Europe. Without knowledge about the relative impacts of con-

sumer expenditure on the environment, European citizens cannot make conscious choices

regarding their expenses and how they relate to environmental impact, while policy-makers

have no basis to develop tailored environmental policy mechanisms. This study combined

Input-Output Analysis, Data Mining, and Regression Analysis to check if a correlation

between expenditure and impact exists. The results are contextualised in consumption cat-

egories. Results from Input-Output Analysis suggest that Housing, Food, and Transport

are the largest categories of expense throughout Europe. While expenses vary significantly

throughout Europe, common trends emerge. Pattern Recognition and Cluster Analysis

algorithms show that expenditure habits differ especially between north-west, east, south,

and coastal Europe. Each of the four groups consists of between six and eleven nations. In

general, lower economic development indicates higher expenditure in Housing and Food,

while higher economic development indicates higher expenditure in Recreation & Culture,

and Goods & Services. Coastal Europe spends more on Restaurants & Hotels, and Edu-

cation. The expenditures were translated to four environmental impacts; Global Warming

Potential, Land Use, Material Use, and Blue Water Consumption. Next, correlation be-

tween expenditure and environmental impact was checked using Regression Analysis. The

analysis showed that out of the twelve expenditure categories, Clothing & Footwear and

Furnishing & Household, showed a significant correlation between expenditure and the

four impact categories. Food, Alcohol & Tobacco, and Recreation & Culture showed sig-

nificance in two impact categories, and Transport showed significance in one category.

In total 15 out of the 48 (31%) tested impact categories showed significance. Using the

identified groups, the amount of impact categories that show correlation with expenditure

grows to 44%, and up to 68%. Unfortunately, given the size of these groups, these results

are not statistically significant. That said, the method shows promise, and further research

with a larger scope could produce statistically significant results.

iii

Page 4: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN
Page 5: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Sammanfattning

Allt eftersom trycket pa miljon som skapas av manskligheten okar blir det mer och

mer viktigt att ha forstaelse for hur produkter paverkar en persons totala paverkan for

att gora val om hur en person valjer att konsumera. Senare litteratur visar att hushallens

konsumtion ar ansvarig for 51% till 81 % av landets totala utslapp (Ivanova, Stadler,

et al. 2016). Denna studie undersoker om utgifterna ar en genomforbar proxy-matning

for konsumentpaverkan i Europa i hopp om att ge konsumenterna mer kunskap om sina

konsumtionsvanor och hur dessa paverkar miljon. Detta gors genom att kombinera input-

output-analys, datamining och regressionsanalys for att kontrollera om det finns en bety-

dande korrelation mellan utgifter och effekter. Resultaten ar kontextualiserade i COICOP-

kategorier. Resultaten fran analysen for input-output tyder pa att bostader, mat och

transport ar de storsta kostnadskategorierna i hela Europa. Utgiftsbeloppet varierar av-

sevart i hela Europa, trots att trenden dyker upp. Monsterigenkanning och klusteranalys

algoritmer visar att utgiftsvanor skiljer sig sarskilt mellan nordvastra, ostra, sodra, och

kusten av Europa. Var och en av de fyra grupperna bestar av mellan sex och elva nationer.

I allmanhet indikerar lagre ekonomisk utveckling hogre utgifter for bostader och livsmedel,

medan hogre ekonomisk utveckling indikerar hogre utgifter inom rekreation & kultur och

varor & tjanster. Europas kust spenderar mer pa restauranger & hotell och utbildning.

Miljopaverkan av alla utgifter beraknades med fyra kategorier for miljopaverkan; Global

uppvarmningspotential, markanvandning, materialanvandning och bla vattenforbrukning.

Darefter kontrollerades sambandet mellan utgifter och miljopaverkan med hjalp av regres-

sionsanalys. Analysen visade att utav de tolv utgiftskategorierna hade klader & skor samt

inredning och hushall betydande samband mellan utgifterna och de fyra effektskategori-

erna. Mat, alkohol & tobak och rekreation & kultur var signifikanta i tva effektskate-

gorier, och transport visade var signifikant i en kategori. Totalt var 15 av de 48 testade

paverkanskategorierna signifikanta. Det har ar 31%. Med hjalp av de identifierade grup-

perna okar beloppet av effektskategorier som korrelerar med utgifterna fran 44% upp till

68%. Det ar viktigt att notera att med tanke pa storleken av dessa grupper ar dessa

resultat inte statistiskt signifikanta. Daremot verkar metoden vara lovande. Ytterligare

forskning med storre omfattning skulle kunna ge statistiskt signifikanta resultat.

v

Page 6: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN
Page 7: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Acknowledgements

This study is the final fruit of my labour as a student in Sustainable Technologyat the Royal Institute of Technology. Throughout this program, I have had theopportunity to work with many talented and inspiring people. It would be a surpriseto no one that this is also the case for this final product, and I want to thank everyonewho offered their time, assistance, or emotional support. First, I would like tothank Rafael Laurenti and Michael Martin for supporting and guiding me throughthe entire process as my supervisors. Second, I would like to thank Tobias, withwhom I worked closely together, and with whom I overcame many obstacles alongthe way. Third, I would like to thank everyone at IVL, thesis students included, fortheir openness and energy. Fourth, I want to thank Rebecca and Malin for theirhelp with translating. Last, I would like to thank my family, as well as Sander andAmelia for their support in getting through the setbacks that come with a projectof this scope.

vii

Page 8: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN
Page 9: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Glossary

EQ Quantisation Error. 22, 26

ET Topographic Error. 22, 26

ANN Artificial Neural Network. 11, 12, 22

BMU Best Matching Unit. 12, 27, 49, 50, 51, 52, 53

CBEI consumption-based emissions inventory. 5

DM Data Mining. 2, 3, 5, 11, 20, 22, 46

EEMRIOT Environmentally Extended Multi-Region Input-Output Table. xiv, 8,9, 39, 41

FD Final Demand. xii, 2, 3, 8, 18, 19, 20, 21, 22, 25, 26, 40, 41, 42

IOA Input-Output Analysis. 2, 5, 6, 7

IOT Input-Output Table. 6, 7, 8, 9, 18, 46

IV Input Vector. 12, 49, 50, 51

NACE Nomenclature statistique des Activites economiques dans la CommunauteEuropeenne. 9, 19, 20

NF Neighbourhood Function. 12, 51

PPS Purchasing Power Standard. 21, 31, 48

RO Research Objective. 2, 3, 17, 18, 20, 21, 22, 25, 26, 32, 47

SOM Self-Organising Map. 2, 3, 11, 12, 13, 20, 21, 22, 23, 26, 27, 29, 30, 39, 42,46, 47, 49, 50, 51, 52, 53

SSE Sum of Squared-Error. xvi, 15, 28, 29

ULN Unsupervised Learning Network. 11, 12

WV Weight-Vector. 12, 13, 49, 50, 51, 52

ix

Page 10: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN
Page 11: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Contents

1 Introduction 11.1 Aim & Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Research Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Objective 1 - Final Demands in Europe . . . . . . . . . . . . . 31.2.2 Objective 2 - Pattern Recognition . . . . . . . . . . . . . . . . 31.2.3 Objective 3 - Cluster Analysis . . . . . . . . . . . . . . . . . . 3

2 Theoretical Background 52.1 Consumption-based Emissions . . . . . . . . . . . . . . . . . . . . . . 52.2 Input-Output Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 Exiobase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 The COICOP system . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4 Impact Categories & Characterisation . . . . . . . . . . . . . . . . . . 102.5 Data mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5.1 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . 112.5.2 Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.6 Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.6.1 Measuring Regression Quality . . . . . . . . . . . . . . . . . . 16

3 Method 173.1 Calculating the Final Demand per capita of European Nations in

COICOP categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.1.1 Calculating the Final Demand . . . . . . . . . . . . . . . . . . 183.1.2 Aligning Exiobase & COICOP . . . . . . . . . . . . . . . . . . 193.1.3 Calculating Final Demand (FD) per Capita . . . . . . . . . . 20

3.2 Investigating Expenditure Habits . . . . . . . . . . . . . . . . . . . . 203.2.1 Pre-processing the Data . . . . . . . . . . . . . . . . . . . . . 213.2.2 Applying the SOM Algorithm . . . . . . . . . . . . . . . . . . 223.2.3 Applying K-Means Clustering . . . . . . . . . . . . . . . . . . 23

3.3 Applying Regression Analysis . . . . . . . . . . . . . . . . . . . . . . 23

4 Results 254.1 FD per Capita of European Nations . . . . . . . . . . . . . . . . . . 254.2 Investigating Expenditure Habits . . . . . . . . . . . . . . . . . . . . 264.3 Clustering the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.4 Applying Regression Analysis . . . . . . . . . . . . . . . . . . . . . . 32

4.4.1 Comparing the r2-values . . . . . . . . . . . . . . . . . . . . . 344.4.2 Identifying Outliers . . . . . . . . . . . . . . . . . . . . . . . . 35

xi

Page 12: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

4.5 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5 Discussion 395.1 The Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.2 The Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.3 The Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.4 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6 Conclusion 456.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

A Appendices 47A.1 Software Packages used . . . . . . . . . . . . . . . . . . . . . . . . . . 47A.2 Nation Codes and Population . . . . . . . . . . . . . . . . . . . . . . 48A.3 The SOM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 49A.4 Boxplot of expenditure . . . . . . . . . . . . . . . . . . . . . . . . . . 54A.5 Distribution of Expenditure per COICOP Category . . . . . . . . . . 55A.6 Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56A.7 Statistical Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

xii Chapter 0

Page 13: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

List of Figures

2.1 Venn Diagram of Consumption-based and Production-based Inventories 62.2 Representation of an Environmentally Extended Multi-Region Input-

Output Table (EEMRIOT), derived from Tukker et al. (2014) . . . . 8

3.1 Flow Diagram presenting an overview of the Method. The legend onthe top right side provides more information on how to read this flowdiagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.1 Boxplot of expenditure per COICOP Category in e . . . . . . . . . . 254.2 Clusters on the SOM grid and a map . . . . . . . . . . . . . . . . . . 274.3 Heat map of each division in % of total expenditure per capita in a

nation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.4 Plot of sum of squared error for the K-Means algorithm over various

cluster amounts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.5 Clusters on the SOM grid and a map . . . . . . . . . . . . . . . . . . 304.6 Boxplot of clusters per COICOP category in % of total spending . . . 314.7 Boxplot of clusters per COICOP category in e . . . . . . . . . . . . 314.8 Regression of three largest expenditure categories . . . . . . . . . . . 334.9 Regressions for Transport in Impact Categories . . . . . . . . . . . . 34

A.1 Boxplot of expenditure per COICOP Category in % . . . . . . . . . . 54A.2 Regressions for Food in Impact Categories . . . . . . . . . . . . . . . 56A.3 Regressions for Alcohol & Tobacco in Impact Categories . . . . . . . 56A.4 Regressions for Clothing & Footwear in Impact Categories . . . . . . 57A.5 Regressions for Housing in Impact Categories . . . . . . . . . . . . . 57A.6 Regressions for Furnishings & Housings in Impact Categories . . . . . 58A.7 Regressions for Health in Impact Categories . . . . . . . . . . . . . . 58A.8 Regressions for Transport in Impact Categories . . . . . . . . . . . . 59A.9 Regressions for Communication in Impact Categories . . . . . . . . . 59A.10 Regressions for Recreation & Culture in Impact Categories . . . . . . 60A.11 Regressions for Education in Impact Categories . . . . . . . . . . . . 60A.12 Regressions for Restaurants & Hotels in Impact Categories . . . . . . 61A.13 Regressions for Goods & Services in Impact Categories . . . . . . . . 61

xiii

Page 14: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN
Page 15: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

List of Tables

2.1 Overview of Exiobase 3 stressors . . . . . . . . . . . . . . . . . . . . . 10

4.1 Sum of Squared-Error (SSE) and its decrease per amount of clustersused . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.2 Overview of the four clusters and their nations . . . . . . . . . . . . . 304.3 Overview of the r2-values of each regression model, each with expen-

diture in that category and one of the four impact categories as theindependent and dependent variable respectively . . . . . . . . . . . . 34

4.4 Amount of significant r2-values per cluster . . . . . . . . . . . . . . . 35

A.1 Software Packages used . . . . . . . . . . . . . . . . . . . . . . . . . . 47A.2 Overview of all included nations, their population and PPS. . . . . . 48A.3 Overview of Parameters, their suggested values, and their optimised

values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53A.4 Overview of the expenditure per capita in euro, showing the mean,

standard deviation, and the spread of the first six categories . . . . . 55A.5 Overview of the expenditure per capita in euro, showing the mean,

standard deviation, and the spread of the last six categories . . . . . 55A.6 R-values and P-values for all nations . . . . . . . . . . . . . . . . . . 62A.7 R-values and P-values for Cluster 0 . . . . . . . . . . . . . . . . . . . 63A.8 R-values and P-values for Cluster 1 . . . . . . . . . . . . . . . . . . . 64A.9 R-values and P-values for Cluster 2 . . . . . . . . . . . . . . . . . . . 65A.10 R-values and P-values for Cluster 3 . . . . . . . . . . . . . . . . . . . 66

xv

Page 16: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN
Page 17: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Chapter 1

Introduction

Reducing consumption is a focal point of growing importance in regional, national,and international context. As international bodies, the UN shows the importanceof sustainable consumption and production as goal twelve of the UN SustainableDevelopment Goals (United Nations 2018), while the EU does so through its Sus-tainable Consumption and Production Plan (COTEC 2008). These bodies highlightmultiple areas in which global consumption and production can improve, rangingfrom energy use to food waste. On a national level, many governments introducedtheir plans tailored to national needs. Sweden introduced its strategy for sustainableconsumption in its 2017 budget bill, prioritising food, transport, and housing. TheSwedish ”Strategy for sustainable consumption” highlights seven focus areas, someof which are encouraging sustainable ways of consuming, streamlining resource use,and focusing on food, transport, and housing (Ministry of Finance 2016).

There are different ways of estimating the environmental impact of a nation, andthese can lead to different outcomes. Consumption-based emissions inventory fo-cusses, contrary to production-based emissions inventory, on the actual consumptionof a nation, instead of what said nation produces inside its borders. Consumption-based emissions inventory includes imports while it excludes exports. On the con-trary, production-based emissions inventory excludes imports and includes exports,which can skew the actual amount of impacts related to a nation’s demand. Anoften used production-based unit of measurement is GDP, which has proven to beunreliable and opens opportunities for Carbon Leakage (Peters and Hertwich 2008)

Ivanova, Stadler, et al. (2016) showed that over 60% of the yearly GreenhouseGasses produced, and between 48% and 80% of all land, water, and material usearise from household consumption globally using a consumption-based perspective,.Therefore, stimulating households to change their consumption habits shows signif-icant potential for lowering the overall climate impact humanity imposes on Earth.However, as also shown by Ivanova, Stadler, et al. (2016), households consume differ-ently, and thereby impact the environment differently across the globe. Therefore, itis important to make a distinction between the natures of consumption throughoutregions.

1

Page 18: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Introduction

The goal of this study is to identify different types of consumer expenditurehabits within Europe by calculating the average per capita spending and to iden-tify the environmental impacts per capita for each type to find correlations betweenexpenditure and environmental impacts. To achieve this goal, this report focuseson Input-Output Analysis (IOA) and various Data Mining (DM) techniques to cal-culate and segment the nations and their consumption habits into clusters. Next,the data will be subjected to regression analysis to explore the correlation betweenconsumer expenditure and environmental impact. To contextualise the expendi-ture and the resulting environmental impacts, the expenditures are expressed inCOICOP categories (Department of Economic and Social Affairs 2018a), while theenvironmental impacts are characterised according to the Environmental Footprint(Steinmann et al. 2018).

This report employed the Exiobase 3 database for its Input-Output tables (Stadleret al. 2018), which have been used for consumer and industry consumption analysisfor different nations and regions, and enabled extensive comparison between said re-gions (Ivanova, Stadler, et al. 2016; Beylot et al. 2019). DM has proven to be usefulfor analysing different types of consumption patterns, ranging from a focus on energyand home appliance use (Gajowniczek and Zabkowski 2015), to food expenditure(Fan et al. 2007), and the yearly expenditure of households (Froemelt, Durrenmatt,and Hellweg 2018a). The two DM techniques applied are the Self-Organising Map(SOM) and K-Means algorithm; see future details in the subsequent text.

1.1 Aim & Objectives

The aim of this report is to explore potential correlations between expenditure andenvironmental impacts for thirty European nations. Identifying these correlationscould enable both consumers and policy-makers to obtain more insight into how theirconsumption shapes their interaction with the environment. The report attemptsto answer the following research questions to support the aim:

Is consumer expenditure a valid proxy measurement forenvironmental impacts?

To support this Research Question, the report uses the following Research Objectives(ROs):

RO 1: Develop and apply a framework for calculating the FD per capitaof European nations using Exiobase products expressed in consumption cate-gories.

RO 2: Investigate expenditure patterns throughout European nations usingDM and group them according to these habits.

RO 3: Explore the correlation between environmental impact and consumerexpenditure for the identified clusters using regression analysis.

2 Chapter 1

Page 19: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Introduction

1.2 Research Approach

The ROs form the basis of the research approach, and are partially based on the ar-ticles by Ivanova, Stadler, et al. (2016), Froemelt, Durrenmatt, and Hellweg (2018a),and Beylot et al. (2019). The aforementioned ROs are tailored to the scope andtime constraints of this research and described in more details below.

1.2.1 Objective 1 - Final Demands in Europe

Popularised by Wassily Leontief, Input-Output analysis is a method for quantifyinghow different sectors in a complex economy relate to each other in a systemic way(Leontief 1986). The Exiobase 3 input-output database contains an overview of allmaterial and monetary streams of 200 products and 163 industries across 43 nationsand 5 Rest-of-World regions (Stadler et al. 2018).

The Exiobase database has many capabilities, but the first objective focusessolely on the FD per capita in thirty European nations. The goal of this RO is tocalculate, isolate, and adjust each nation’s FD to represent the expenditure of eachnation per capita. Therefore, it is necessary to calculate these using Exiobase andaggregate the nations and regions excluded from this research. Next, the 200 Ex-iobase products need to be lined up with the COICOP system for contextualisation.

1.2.2 Objective 2 - Pattern Recognition

To identify different types of expenditure between the FDs of the included nations,this report relies on Data Mining; Pattern Recognition and Cluster Analysis tech-niques specifically. In short, DM, also known as Knowledge Discovery, is the processof extracting interesting information from large amounts of data (Han and Kamber2006, p.5). Pattern Recognition and Cluster Analysis can sort and identify certainpatterns in large data sets and highlight relationships between data points in theset. As such, these tools can also be used to segment different spending habits anddetect trends that go further than socio-economic pre-definition and standard re-gression methods (Girod and de Haan 2009).

The FDs calculated in RO 1 serve as input data for the used DM techniques.First, a SOM will enable pattern recognition for data sets with high dimensional-ity. Second, K-means can find clusters withing the identified patterns to identifythe different spending habits between nations. The Theory and Method discuss theconcepts and application of these techniques, respectively.

1.2.3 Objective 3 - Cluster Analysis

The third RO aims to investigate whether the FDs and clusters identified in RO 1and 2 show correlations between spending and impact. To do so, the FDs per capitaare first expressed in environmental impact, based on the environmental footprintas introduced by Steinmann et al. (2018). Next, regression techniques show whetherthere is a significant correlation between spending and impact.

Chapter 1 3

Page 20: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN
Page 21: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Chapter 2

Theoretical Background

As mentioned in the Introduction, this research relies heavily on IOA, DM, andRegression Analysis. Therefore, this chapter explains these concepts in more detailbelow. More detailed information on the Self-Organizing Map can be found in Ap-pendix A.3. The information in the appendix allows for a deeper understanding ofthe theory, but is not necessary for understanding the method and results of thestudy.

2.1 Consumption-based Emissions

The consumption-based emissions inventory (CBEI) framework is used to calculateall emissions associated with the production, transportation, use, and disposal ofproducts and services that are consumed by a community or entity (Broekhoff, Er-ickson, and Piggot 2019, p.5). This research does not focus on certain communitiesor entities, but on nations. Furthermore, it does not simply look at one nation,but at multiple nations and the interactions of these nations to fulfil each other’sconsumption demands.

Regardless, the concept of CBEI stays the same; the emissions arising fromwithin a pre-defined area or community due to production, excluding exports toother areas or communities, and including imports from other areas or communities(C40 Cities 2018, p.4). Peters and Hertwich (2008) define a similar concept, GHGInventory, and compare production-based and consumption-based GHG inventoryas follows:

Production-based GHG Inventory = Production + Exports - Imports

Consumption-based GHG Inventory = Production - Exports + Imports

CBEI can be used to calculate the consumption of a nation instead of relyingon production-related data like GDP which has proven to be unreliable and opensopportunities for Carbon Leakage (Peters and Hertwich 2008). Figure 2.1 belowshows the differences in production-based and consumption-based GHG Inventory.

5

Page 22: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Theoretical Background

Figure 2.1: Venn Diagram of Consumption-based and Production-based Inventories

As shown in the definition above, it is necessary to know what part of the domes-tic production is exported, and what part of the domestic consumption is import.IOA is a tool that enables us to track what percentage of a consumed product comesfrom where, and what percentage of domestic production is exported to where.

2.2 Input-Output Analysis

First introduced in 1936 by Leontief (1936), IOA is an industry analysis methodthat utilises information on the flow of goods to create an overview of an industrialsystem. IOA systematically quantifies the relationships that different sectors of aneconomic system share (Leontief 1986, p. 19). The size of the system is irrelevantfor the application of IOA, and can range from the world economy to a single or-ganisation, and the methodology is the same regardless of size (Ibid.).

IOA works using Input-Output Tables (IOTs). An IOT houses a series ofvectors that describe how much a specific sector takes as input, and how much itdelivers to other sectors as outputs. The vectors show the dependency of one sectoron other sectors that supply it with goods. This interdependency is shown througha series of linear equations that express the total input and the aggregate of theoutput the different products and services the sectors in the economy in questionproduce in a given time period. As such, the flows of goods and services or itsmonetary flows can be presented by a matrix of these vectors. The flows shown inan IOT ultimately stem from Supply and Use Tables of an economy, the buildingblocks of an IOT. In short, Supply tables show how products are brought into aneconomy, while Use Tables show how these products are used. Therefore, an IOTcan be seen as a combination of these two tables (Department of Economic andSocial Affairs 2018b, p.3).

6 Chapter 2

Page 23: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Theoretical Background

The IOTs proposed by Leontief (1986, p. 19-22) show the flows of products orservices between different sectors and the sector itself. In this overview, a sector isboth a producer for other sectors and for itself, as well as a consumer of products fromother sectors. An interindustry IOT shows the output of one sector to other sectorsand itself in each row, while columns show the input necessary for a sector to createits output. On the right side of the industry matrix with all in- and outflows fromindustry to industry, are additional columns; the Final Demand Columns. Thesecolumns present the sales by each industry to the final markets of the scope of thetable. In most cases, these columns are for consumers, the government, and invest-ment (Miller and Blair 2009, p.2). Aside from IOTs presenting quantities of goods,most IOTs show sector exchanges in monetary terms, and is, at a national level,sometimes also referred to as a ”system of national accounts” (Leontief 1986, p. 21).

The exchange of goods that IOTs show are in monetary terms since one industryor sector can produce and sell different products that use different resources and sellfor different prices, and recording what products are sold to which sectors is oftendeemed too complicated. Price fluctuation is a problem connected to monetary val-ues as a base for IOTs and IOA. In practice this means that changes in paymentfrom one time period to another do not automatically mean a change in physicalinput to a sector (Miller and Blair 2009, p. 11).

Using IOA, the imports and exports of an industry or nation can be identifiedand split. As a result, one can derive the actual consumption of said industry ornation. IOTs, as introduced by Leontief (1936) would show the total consumptionin monetary terms. As the size of the economy in question is irrelevant for theapplication of IOA, analysis on an international level allows for the identificationand isolation of imports, exports, and domestic production flows between nations.

Multi-Region IOTs

IOTs that span multiple regions or nations, and thus can hold multiple of the sameindustries that are active in different regions or nations, are called Multi-RegionIOTs As mentioned earlier, expanding IOA to an international scale is possiblewithin the framework, and both proposed and implemented by Leontief (1974). Inpractice, they differ from regular IOTs in the sense that they include a row andcolumn for every industry in an included country, and are therefore considerablylarger than national IOTs, but effectively work in the same way.

Environmentally Extended IOTs

Where the original IOTs and IOA allows for analysing monetary flows, Environ-mentally Extended IOTs focus on the environmental impact associated with theflow and use of these products and services. The IOTs expanded upon by integrat-ing information on associated impacts from physical accounting such as materialconsumption or pollution. This way, the impacts associated with the consumptionof a good or service can be traced back to the production of said product.

Chapter 2 7

Page 24: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Theoretical Background

As mentioned earlier, IOTs are normally expressed in monetary terms insteadof physical terms, while the associated impacts found in Environmentally ExtendedIOTs are expressed in various physical units like joule and kilogram. Since theseunits do not line up, various assumptions have to be made that can significantlyimpact the result (Schaffartzik et al. 2014). Figure 2.2 below represents an EEM-RIOT, with the flow between industries and countries, their FD, associated stressorsand total impacts.

Figure 2.2: Representation of an EEMRIOT, derived from Tukker et al. (2014)

In the last decades, several different EEMRIOT databases have been developed,spanning different periods and, regions, and environmental impacts (Lenzen et al.2013; Timmer et al. 2015). Since these tables include information for industriesacross different regions, they also allow for including different impacts associatedwith different production methods used in different regions. For example, due toregulations, beef might be associated with different environmental impacts in Brazilthan it is in France, due to different regulations among other things. Therefore,EEMRIOTs allow precise calculation of environmental impact depending on wherethe product is fabricated.

On the contrary, one of the challenges that limit precise calculation of envi-ronmental impacts is the need for aggregation and simplification: as precise dataregarding industrial activities around the globe are not always available, IOTs oftenuse somewhat simplified and aggregated data, assumptions made for these simpli-fications and aggregations, as well as different data sources, can lead to differentoutcomes. As a result, the different databases available all deliver somewhat differ-ent outcomes (Moran and Wood 2014; Dawkins et al. 2018).

8 Chapter 2

Page 25: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Theoretical Background

While data sources might influence the outcomes of an IOT, the general assump-tions for the IOT construction are also important. An important fork in the roadof IOT design is on what transformation model the IOT bases itself. Using Supply& Use Tables, two assumptions can lead to four different models of IOT; model A,B, C, or D. The first assumption focuses on the way a product is produced. ModelA assumes that each product is produced individually, and can therefore be countedas single products, while model B assumes that each industry produces products intheir own specific way.

The second assumption highlights sales structure, and either assumes that eachindustry has its own sales structure, or that each product comes with its own salesstructure. The models that derive from this assumption, model C and model D,where model C assumes the former and model D assumes the latter. Model A andmodel D are used most often (Department of Economic and Social Affairs 2018b;Eurostat 2018, p.321-349). The database used in this research, Exiobase, is anEEMRIOT database available in both product by product and industry by indus-try.

2.2.1 Exiobase

The Exiobase project includes Input and Output data for 200 products, 163 indus-tries in 43 countries and 5 Rest-of-World regions in both physical and monetaryterms, and is compatible with the System of Environmental-Economic Account-ing. The 200 products are based on the Nomenclature statistique des Activiteseconomiques dans la Communaute Europeenne (NACE). The environmental im-pact of all products are expressed in 1104 different stressors, ranging from CO2

emissions to land use change. The most recent version of the database is Exiobase3 and includes yearly overviews from 1995 to 2011 (Stadler et al. 2018). Since 2011is the most recent data available, this year will form the basis of the first objective.The type of IOT used is product by product, representing the flow of productsbetween nations in millions of Euros.

2.3 The COICOP system

The COICOP classification system is a framework for grouping different consum-able goods or services into categories and subcategories. It was developed by theUnited Nations Statistics Division to provide homogeneity in the collection of house-hold expenditure data. The COICOP system is used in consumer price and GDPcomparison between nations, and also for household budget surveys. The COICOP1999 system divides goods and services in fourteen ”Divisions” (First Tier), whichare then categorised in ”Groups”(Second Tier) and ”classes” (Third Tier). In 2018,the UN revisioned the COICOP system, adding a Division, multiple groups andclasses, and a fourth level called ”Subclasses”. To contextualise the 200 productsfrom Exiobase, this report utilises the the COICOP 1999 system, using the firsttwelve Divisions (Department of Economic and Social Affairs 2018a).

Chapter 2 9

Page 26: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Theoretical Background

2.4 Impact Categories & Characterisation

The Exiobase database includes 1104 different environmental stressors to calculatethe impact of different products and activities, and these stressors have to be cat-egorised into environmental indicators to make comparing and communicating theresults effective. Where division between more smaller indicator categories allowsfor better coverage of all impacts, limiting the amount of indicator categories to asfew as possible makes them easier to comprehend, and thus better communicatingresults and for building environmental policy. Steinmann et al. (2018) present anoverview of the stressors found in Exiobase 3, which can be found in Table 2.1 below.

Table 2.1: Overview of Exiobase 3 stressors

Environmental Stressors

Accounts Amount of which

Water Accounts 194 194 Blue water & Green water per source,including final demand

Material Accounts 513 69 Energy products including final demand222 Used Extractions222 Unused Extractions

Land Accounts 14 14 including build up land for final demandEmissions 57 27 from combustion inlcuding final demand

27 non-combustion3 HFC, PFC, SF6

The four categories presented in Table 2.1 are defined as follows:

• The Water Accounts measure the total consumption of blue water (freshsurface and groundwater) in the Exiobase products in cubic meter (m3).

• The Material Accounts measure the impacts due to extraction of naturalresources such as metals, minerals, and fossil fuels among others, measured inkilo tones (kt).

• The Land accounts measure land-use change impact due to agricultureamong others, and is measured in square kilometer (1 ∗ 103m2).

• The Emissions measure greenhouse gas emissions produced as a result ofproduct consumption, and includes all greenhouse gases and their greenhousegas potential (kg CO2 eq).

As mentioned by Steinmann et al. (2018), these four categories represent 60%all environmental effects. Including more impact categories will allow for a betterrepresentation of environmental effects, but lowers the simplicity and understand-ability.

10 Chapter 2

Page 27: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Theoretical Background

2.5 Data mining

The internet ushered in a new era of data creation, and thereby new desires forutilising some of its valuable aspects, and Data Mining is one of the emerging toolsthat can help unravel some of the interesting trends that would otherwise be diffi-cult to identify. In short, DM, also known as Knowledge Discovery, is the process ofextracting useful information from large amounts of data, and takes contributionsfrom many different fields, ranging from statistics and information science to visu-alisation (Han and Kamber 2006, p.5, 29).

While there are many different definitions of DM, most of the definitions focuson the identification of previously unknown patterns, relationships, or structures inlarge data sets. (Hassani, Saporta, and Silva 2014). As an umbrella term, DMcovers a multitude of techniques, which can be either of a descriptive or predictivenature; where descriptive DM aims to identify the characteristics of the data, whilepredictive data aims to make predictions based on the data. (Han and Kamber2006, p. 21).

The aforementioned tools used in this research, SOMs and K-means clustering,are both considered part of DM, and both are descriptive (Han and Kamber 2006,p. 5, p. 21). The tools will be used within the Knowledge Discovery framework asintroduced by Han and Kamber (2006, p.7). A SOM, however, is a model-basedcategorisation method based on Artificial Neural Networks (ANNs), while K-meansclustering is a centroid-based partitioning method (Han and Kamber 2006, p. 398-404). How ANNs, SOMs, and K-Means work is discussed further below.

2.5.1 Artificial Neural Networks

Inherently non-linear, ANNs were originally pioneered by psychologists and neuro-biologists, and inspired by the working of the brain. Like the brains of living organ-isms, an ANN is made up of neurons, connected to each other through links. AnANN is not an algorithm in itself, but a framework used by different algorithmsto process complex data. The structure of the ANN, its neurons and links are de-pendent on unknown parameters, also known as hidden variables or hidden units(Hassani, Saporta, and Silva 2014; Kuhn and Johnson 2013, p.141-145).

For predictive algorithms, links are created through a learning phase in whichknown data is presented to the algorithm that it can use to classify the data given,creating links and clusters. There are many different types of ANNs, the most com-mon being Feed-Forward Networks, Recurrent Networks, and Unsupervised Learn-ing Networks (ULNs). The ULNs differ from most other types since they learnthrough unlabeled data. Instead, ULNs are designed to identify common traits indata points and structures the data based on the presence or absence of said traits.The SOM is a ULN.

Chapter 2 11

Page 28: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Theoretical Background

Self-Organising Maps

First introduced by Teuvo Kohonen (1990), the SOM falls under the banner orthe often called competitive, self-learning, or ULNs. This means that neighbouringneurons compete with each other and evolve to resemble the specific characteristicsof patterns found in unlabelled data (Ibid.). As a result of the neurons adaptingto resemble the data, SOMs are used for identifying and presenting similarities be-tween high-dimensional non-linear input data in a one, two or three-dimensionalspace, called a feature-map. On this feature-map, the neurons represent data-pointswith certain features, and neurons that are closer together represent data-pointswith relatively similar features. In other words, a SOM makes large datasets withhigh dimensionality visual through dimensionality reduction.

While many ANNs include aforementioned hidden variables or a hidden layer,SOMs do not. Instead, the input layer containing the data set directly maps thecompetition layer containing the network of neurons (Kind and Brunner 2014).Teuvo Kohonen (2001, p.106) himself formally describes the SOM as

”a non-linear, ordered, smooth mapping of high-dimensional input data manifoldsonto the elements of a regular, low-dimensional array”.

In the process, the SOM compresses some of the information of each data point intoweights and thereby also creates some form of abstraction. The compression andabstraction reduces noise from the data set, and allows for visual pattern recogni-tion, or through other algorithms (Froemelt, Durrenmatt, and Hellweg 2018b).

The algorithm works by creating a two-dimensional space, or map, where eachneuron in the two-dimensional space of the map is given a Weight-Vector (WV) inthe same dimensionality as the input data, i.e. if the data points in the datasetcontain 25 vectors per point, so will the WV of the neurons. The WV acts asthe coordinate for the neuron and thus depicts where in the space the neuron willbe. Each neuron receives a random WV, meaning all neurons are usually evenlydistributed. The neurons on the map are then connected to its direct neighborsthrough a Neighbourhood Function (NF). The NF describes the distance betweenneurons, and therefore dictates the topology of the SOM (Froemelt, Durrenmatt,and Hellweg 2018b).

After defining the space and where the neurons are placed in the space, the SOMis trained to find patterns in the data. The algorithm does so by taking data fromthe dataset, and converting its values to vectors in the same dimensionality as theWV of the neurons. This vector represents the data-point, and is called the InputVector (IV). Next, the algorithm calculates which neuron has the WV that is closestthe IV of the data-point. This winning neuron, known as the Best Matching Unit(BMU), then moves closer to the position of the data-point by changing its WVvalue to make it more similar to the data-point. Depending on design choices, theWV of the neurons surrounding the BMU might also be updated, moving a sectionof the map towards the data point. By continuously feeding the SOM data, andthereby changing the WVs of the neurons and thereby the structure of the map andwhere the neurons are, the map and its neurons will start to resemble the data.

12 Chapter 2

Page 29: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Theoretical Background

There are two main forms of the algorithm, and these forms mainly differ inhow the algorithm trains the SOM; where Sequential training takes an iterativeapproach going over each point individually, Batch training does so in batches.Both versions of the algorithm compare the values of data points from the dataset and the WV of all neurons in the network of the SOM for similarities. TheSOM algorithm outputs a feature map that allows for cluster analysis to identifydifferent clusters of data points that share features. Appendix A.3 presents a moredetailed explanation of the Algorithm, though this information is not necessary forunderstanding the results.

2.5.2 Cluster Analysis

In short, cluster analysis involves grouping data, thereby creating clusters of datathat are similar (Hennig et al. 2016, p.2). The objects are grouped based on the prin-ciple of maximising the intraclass similarity and minimising the interclass similarity.That is, clusters of data points are formed so that the points within a cluster havehigh similarity in comparison to one another, but are very dissimilar to points in theother clusters. As a result of high-quality clustering, a data point in a cluster of datais more similar to other data points within its cluster that it is to data from otherclusters. In their book, Hennig et al. (2016, p.3-13) present several dichotomies toclarify how Cluster Analysis can be applied, what types of clustering there are, andwhat data formats and approaches are used. These dichotomies, three importantones are described below. It is important to note that Hennig et al. (2016) men-tion that while they present these types as contrasting characteristics in clusteringtypologies, the different typologies are more related to each other in the sense of acontinuum.

First, there is a distinction between Hard and Soft types of clustering. Hardclustering is also named categorical clustering and allows a data point to only be partof a single cluster. Soft clustering allows data points to have a degree of membershipto different clusters, meaning that a data point can be part of different clusters atthe same time, be it in different degrees. Soft clustering is also called fuzzy clustering.

Second, Flat and Hierarchical clustering allow for different ways to structurethe clusters. Flat clustering assumes that all clusters exist on the same level, i.e.all clusters are equal in level Hierarchical clustering involves the inclusion of sub-clusters, where some clusters include sub-categories of clusters that add levels.

Third, the Data Type or Format can influence how data can be interpreted,and what information can be extracted from the results. The type of desired infor-mation depicts what type of clustering is most effective to extract that information.

While these distinctions allow for framing different types of cluster analysis,Hennig et al. (2016, p.4) stress that not every type of cluster analysis fits withinthese dimensions, and share different characteristics of all dimensions.

Chapter 2 13

Page 30: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Theoretical Background

Clustering Types

Using the different clustering characteristics described above, types of clustering canbe described. Most clustering methods focus on partitioning a data set where a datapoint can be a part of only a single cluster, where

Ca ∩ Cb = 0 for a = b

Alternatively, overlapping clustering allows data points to be part of multiple clus-ters as a result of a degree of membership where a data point is given a value between0 and 1 to show how strongly it correlates with the cluster in question, where for adata set D:

∪ab=1Cb = D

Other overlapping clustering types are ones that include hierarchical clusters. Inclusterings with hierarchical features, data points can be part of multiple clusters,where one is a subset of the other. In this ”tree” of clusters, C is the set of allclusters C = ∪ma=1 in which:

Ca, Cb, ..., Cz are clusters and Ka = |Ca| > ... > Kz = |Cz|, resulting inCa ∈ Ca and Cz ∈ Cz with j < k either

Cj ∩ Cz = Cj or Cj ∩ Cz 6= 0.

This particular type of hierarchical clustering is hard in the sense that while adata point can be part of a subset and superset, it cannot be part of two subsets.If data points are allowed to be part of multiple clusters, that cluster techniquewould be a soft or fuzzy clustering technique. These techniques give data points acollection of values between 0 and 1, indicating their degree of membership to allthe clusters. Probabilistic clustering is considered a fuzzy cluster technique as well(Hennig et al. 2016, p.5-6).

K-means Clustering

When the amount of clusters K is static and known, K-Means clustering is anefficient and effective analysing algorithm. Many different versions of the K-meansalgorithm exist, but their general purpose is to split a set of n objects into K clusters,so that the intersimilarity of objects in a cluster is high, while the similarities betweenclusters is low. This cluster similarity is measured using the mean value of all objectsin a cluster, which is represented by the centroid. Often, this value is the Euclidiandistance from each object to the centroid of the cluster it belongs to. The algorithmaims to find the distribution the K clusters so that the distance from the centroidof each cluster to all its objects are minimised. This Squared-Error Criterion isdefined as:

E =k∑

i=1

∑p∈Ci

|p−mi|2, (2.1)

where E is the sum of square errors for all objects in the data set. p is the pointin space representing an object, and mi is the mean of the cluster Ci.

14 Chapter 2

Page 31: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Theoretical Background

Using the Euclidian distance, the algorithm works as follows:

1 A predefined amount K clusters are created by randomly placing a K amountof centroids among the objects.

2 The distance between each object and each cluster is calculated.

3 Each object is associated to the cluster of the centroid that is closest to it.

4 The centroid of each cluster moves to minimise the distance between it andthe objects it represents.

5 If the centroids do not move or move very little, the algorthim has converged.If not, go back to step 2.

Finding the Optimal Amount of Clusters

As stated earlier, the K-Means algorithm clusters data based on a predefined amountof clusters. Therefore, different amounts of clusters must be checked and comparedto find the optimal amount of clusters that best dissects the data set when theamount of clusters is unknown beforehand. A common method for identifying theoptimal amount of clusters is the ”Elbow-Method”; the elbow method compares theSSE in clusters for different different amounts of clusters by visualising them on ascatter-plot (Liu et al. 2018) .

As the amount of K clusters increases, the SSE decreases since less points thatare closer together are grouped in a cluster; the intra-silimarity decreases. Mean-while, the inter-similarity between clusters increases since an increase of clustersleads to clusters that differ less from one-another. The Elbow-Method can visualisethe point where an increase of the number of clusters does not result in a significantdecrease in the SSE in the clusters (Eq. 2.1).

2.6 Regression Analysis

Regression analysis is a collection of statistical methods for estimating correlationsbetween different variables. There are different types of regression analysis, focus-ing on different correlations between variables, but the most common form is linearregression. Regardless, in all forms of regression analysis focus on the relationshipbetween a dependent variable, and one or multiple dependent variables. The coreof the analysis is predicting the value of the dependent variable as a function of anindependent variable.

The result of regression analysis is a regression function. This function expressesthe dependent variable in the independent variable and a constant. In the case oflinear regression, this function involves one dependent variable y, an independentvariable x, and sometimes a constant b. Equation 2.2 below represents a linearregression model.

y = ax+ b, (2.2)

Chapter 2 15

Page 32: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Method

In this model, the variance of y is assumed to be constant, and both a and b arerepresent regression coefficients. The coefficient a specifies the Y-intercept, while thecoefficient b specifies the slope of the line that represents the correlation. In linearregression both these coefficients can be solved using the least-squares method. Thismethod aims to minimise the error between the actual data and the regression line.Equation 2.3 below represents the least-squares method for a dataset D, containinga dependent variable y and an independent variable x, where x and y are the meanvalues of their respective set.

w1 =

∑|D|i=1(xi − x)(yi − y)∑|D|

i=1(xi − x)2(2.3)

Other types of regression include multiple linear regression and non-linear regres-sion. Multiple linear regression allows for the use of multiple independent variables,and as a result express the dependent variable as a function of these. Non-linearregression focuses on finding correlations in data that is not linear, with polynomialsfor example.

2.6.1 Measuring Regression Quality

Different measurements exist to check a regression model for the quality of its fit.The most common error measurements are the absolute error, and the squared error,which both come in different forms. In its simplest form, the absolute error is thedifference between the value y

′i predicted by the model, and the original value yi:

|yi − y′

i| (2.4)

The squared error squares the error, resulting in

(yi − y′

i)2 (2.5)

Both error measurements also exist as normalised form as relative errors, and as amean error for the average error over the entire set.

Equation 2.6 below presents the mean square error; the error used in this study,though other forms of these error measurements are useful in their own right.∑d

i=1(yi − y′i)

2

d(2.6)

It is important to note that, since the equation 2.6 squares the errors, outliersinfluence the error more than data points that fit the model (Han and Kamber 2006,p.355-363).

16 Chapter 2

Page 33: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Chapter 3

Method

This chapter describes the methods for each RO on a per-section basis, describingeach step taken within each RO. Figure 3.1 presents a flow diagram of the method.This chapter discusses each step taken.

Figure 3.1: Flow Diagram presenting an overview of the Method. The legend onthe top right side provides more information on how to read this flow diagram.

The different tables found in an Input-Output database are name-coded so theycan be expressed in mathimatical form. Unfortunately, there are many differentconventions for naming these tables. Therefore, this study mainly uses the conven-tion from Exiobase 3, and is expanded upon with conventions introduced by Millerand Blair (2009), and Beylot et al. (2019) where necessary. Of all used table codes,their use is described underneath the equation they are used in.

Tools

Python is the main tool for all ROs, Table A.1 in the Appendix shows an overview ofall packages used, as well as their version. This study relied on Excel for minor tasksrelated to processing manually collected information in smaller amounts. AppendixA.1 futher describes why Python was chosen over other options.

17

Page 34: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Method

3.1 Calculating the Final Demand per capita of

European Nations in COICOP categories

At the base of completing this RO are three main steps: first, the IOTs need to becalculated and the FD of households of each nation to be isolated and extracted.Second, the products in which the FDs are expressed, need to be converted toCOICOP categories. Third, the FDs need to be expressed in FD per capita bydividing over the population size of each nation.

3.1.1 Calculating the Final Demand

The gross output of an economic system is defined as Equation 3.1 below

x = Ax+ y, (3.1)

where x is the gross output, A technological requirement matrix, which representsthe necessary direct inputs, and y is the final demand. A certain demand y, requiresa certain production matrix x, which can be calculated through Equation 3.2 below

x = By, (3.2)

where B is the total requirement matrix, which is expressed by the LeontiefInverse expressed in Equation 3.3 below

B = (I − A)−1, (3.3)

where I is the identity matrix of A. Combining equation 3.2 and 3.3, results ina more elaborate equation for output x in equation 3.4 below.

x = (I − A)−1y (3.4)

The calculations above are valid for single-region IOTs, since Exiobase is a multi-region IOT, the equations need to be expanded to support multiple regions eachwith their own set of sectors. This leads to a representation of the calculations aboveakin to equation 3.5 below.

x1. . .

xi. . .

xz

=

A11 · · · A1j · · · A1z

.... . .

.... . .

...Ai1 · · · Aij · · · Aiz

.... . .

.... . .

...Az1 · · · Azj · · · Azz

x1. . .

xi. . .

xz

+z∑

i=1

y1i. . .

yji. . .

yzi

(3.5)

where xi represents the vector of output in a region i, Aij represents the tech-nological requirement matrix from region i to region j, and yij represents the finaldemand from region i to region j, where z is the amount of regions.

18 Chapter 3

Page 35: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Method

Extracting the FDs

After calculating these matrices, the FD y can be extracted. In Exiobase, the FDof each country is expressed in the FD of each of the 200 products for each of theavailable regions. To shorten these FDs, all nations excluded from this study wereaggregated to ”RoW ” as one region. This effectively leaves 30 nations and a RoWregion. The RoW region is later dropped and excluded from the analysis. AppendixA.2 shows an overview of these nations, excluding the RoW region.

Furthermore, the FDs consist of three main categories:

• Final consumption expenditure by households

• Final consumption expenditure by non-profit organisations serving households

• Final consumption expenditure by government

This study only focuses on household consumption, and thus only on the first cat-egory of the three. As shown by Beylot et al. (2019), the other two categories aresignificantly smaller in their demand, and isolating only consumer demand allows fora clearer understanding of the impacts that are associated with direct expenditure,as opposed to indirect expenditure such as through paid taxes.

3.1.2 Aligning Exiobase & COICOP

After dropping the nations excluded from this study, as well as the excluded cate-gories of FD, the resulting list of FDs per Exiobase product per country, needs to besorted in COICOP categories. Since Exiobase bases its products on the NACE, theEU overview provided more information on the products in the contextualisationprocess (EUROSTAT 2008; European Commission 2010).

Aligning COICOP and Exiobase is not new, though it is more usual to ex-press COICOP products in Exiobase products. Studies like the ones performed byFroemelt, Durrenmatt, and Hellweg (2018a) and Ivanova, Vita, et al. (2017) focuson expressing the results of the Household Budget Survey, a survey used to investi-gate household expenditure, in Exiobase products to quantify environmental impactbased on expenditure habits. Steen-Olsen, Wood, and Hertwich (2016) present aset of steps necessary to align the Household Budget Survey and Exiobase. Thoughmany of the steps taken by Steen-Olsen, Wood, and Hertwich (2016) are not neces-sary in this methodology, they provide insight in possible issues and limitations tothe methodology used here.

As stated earlier, this study includes the first twelve of the fourteen COICOPcategories, and aims to include all 200 products from Exiobase. To allow this, eachExiobase products needs to be classified in one of the COICOP categories. Theprerequisites for the classification process follow on the next page:

Chapter 3 19

Page 36: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Method

• Each Exiobase products can only be classified in one COICOP Category

• If the Exiobase product is not clear from its title, the NACE system providesbackground information

• Different chemical products were checked for product use in the database ofthe US Department of Health and Human Services (2019)

• If a product does not fit in any of the COICOP categories, they are classifiedin category CPXX: Rest and excluded from the study.

3.1.3 Calculating FD per Capita

The classified Exiobase products form an overview of the total spending per COICOPcategory for a nation. As the sizes of nations differ greatly throughout Europe, di-viding this total expenditure per nation by the number of inhabitants allows forbetter comparison between said nations. The latest data from Exiobase is 2011, andso dividing the expenditure of a nation by the number of inhabitants in that year re-sults in the average expenditure per capita. Eurostat (2019b) publishes an overviewof the population for all EU countries, as well as some other European countries,this data set is the source of all population data used in this study. Dividing theFD for each product per country results in an overview of the FD, in millions ofe . Therefore, FDs will also be multiplied by 1,000,000 to expresse the FDs in e .Next, aggregating to the COICOP categories results in the FD per capita in eachnation in COICOP categories.

3.2 Investigating Expenditure Habits

The FDs per category per nation from RO 1 serve as the source for uncoveringpotential expenditure habits in household consumption using DM. Using a SOM,underlying trends in the FDs of the nations can be identified and visualised. Theresults of the SOM will form the basis for cluster analysis using K-Mean Clustering.

Han and Kamber (2006, p.7) present a Knowledge-Discovery framework forpreparing data and applying DM techniques:

1 Cleaning the data removes both noise and inconsistencies within the data.

2 Data Integration combines different data sources needed

3 The relevant data are selected from the combined database

4 The data is transformed into workable structures for the planned actions dur-ing Data Transformation.

5 The DM tools are applied to unveil patterns in the data

6 The patterns found are evaluated to see which of the identified patterns areinteresting.

7 The most interesting findings are visualised and presented

20 Chapter 3

Page 37: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Method

Step 1-4 fall under the banner of data pre-processing, and focus on moulding theraw data into a workable from. Step 5-7 focus on working the data to find trends.The data for this RO is the result of RO 1, and the three first steps, as well aspart of step four, will already be applied there. Cleaning the data is achieved bycalculating and isolating the FDs of the included nations. Since all data originatesfrom Exiobase, data integration (step 2) is not necessary, and step 3, data selection,is part of step 1. Finally, part of step 4 is classifying the 200 Exiobase products inCOICOP products.

Like RO 1, RO 2 is divided in three main parts: first, finishing step four in theKnowledge-Discovery process ends the pre-processing stage and allows for the latersteps of the framework. The second part involves applying step five, six, and sevenof the framework with the SOM algorithm, while part three involves the same stepsfor the K-means algorithm.

3.2.1 Pre-processing the Data

The result of RO 1 is an overview of the FDs per capita per nation in COICOPcategories. The FDs of these nations will differ greatly in their absolute amounts,simply because some nations possess a more developed economy than others. Forthis reason, it is important that the expenditures are transformed to be comparable.Two methods were explored to achieve this:

1 Transforming the data using the Purchasing Power Standard (PPS)

2 Taking the expenditures per COICOP category as percentages of the totalexpenditure in said nation

Using Purchasing Power Standard

The PPS is an artificial currency unit that allows for comparison between differenteconomies. One unit of PPS can purchase the same amount of goods and servicesthroughout different economies, and is derived from the economic aggregate of acountry by its Purchasing Power Parity (Eurostat 2019c). By converting the FDsof each nation using the PPS of 2011, the FDs will be represented in a comparableunit. More info on how Purchasing Power parities are calculated can be found inOECD and Statistical Office of the European Communities (2007, p.235-241).

The PPSs were taken from Eurostat (2019a), and include the data for all in-cluded nations. A higher PPS indicates that a nation is relatively more expensive.The EU28 average is taken as 100, meaning that the PPS indicates the costs of goodsand services relative to the EU28 average. The PPS of Norway is highest (155,5),while the PPS of Bulgaria is lowest(55.1). Dividing each FD with its respectivePPS and multiplying with 100 results in FDs in a comparable unit. Appendix A.2presents an overview of the PPS of each nation.

Chapter 3 21

Page 38: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Method

Using Percentages

An alternative to transforming the data to a common and comparable monetaryunit, is comparing the expenditures based on their percentage of their respectivetotal expenditure. To achieve this, each FD in the result of RO 1 would have itscategories divided by the total of that FD. The result would be all FDs expressedin the percentage of the total expenditure for each category.

Normalisation

As with many DM techniques, normalisation of the data improves the results, and isespecially useful for ANNs like the SOM (Han and Kamber 2006, p.70-72). There-fore, the data is normalised using the SOMPY package for Python (Moosavi andPackman 2019)

3.2.2 Applying the SOM Algorithm

Since there is no clear convergence statement for the SOM algorithm, two error-values serve as the main indicators of the quality of the output. Tan and George(2005) present an overview of these error values and how the other parameters shouldbe chosen. First, the Topographic Error (ET ) checks whether the topography of themap is in order. Second, the Quantisation Error (EQ), measures how well the inputsare represented in the output space. For both error values, lower values are better.Tan and George (2005) stress that when comparing maps, the map with the lowestTopographic Error is generally better. The Quantisation error can be checked whenmaps have very similar Topographic errors.

The different starting parameters are taken from Haykin (2009, p.425-436), Tanand George (2005), and Vesanto (2000). These parameters will be optimised accord-ing to their importance as stated in Froemelt, Durrenmatt, and Hellweg (2018b):

1 The Neighbourhood radii

2 Amount of training sessions

3 Number of Neurons

4 Type of Neighbourhood Function

5 Initialisation

6 Map ratio

Based on the size and depth of the data-set, Table A.3 in the appendix presentsthe different parameters according to literature suggestions where available for thedata-set with a depth of n = 30, as is used in this study.

22 Chapter 3

Page 39: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Method

Using these starting parameters, all parameters are fine-tuned in order of im-portance. For each parameter, the SOM will be run multiple times with variousrandom values around the proposed value, after which the error-values are comparedto determine which value produces the best results. After fine-tuning all parameters,both SOMs are run ten times again, and the three maps with the best error-valuesare visually inspected. The two best models for each dataset are then compareddirectly to investigate which dataset produces the best results.

3.2.3 Applying K-Means Clustering

The SOM model with the best results is subjected to K-Means, using the elbowmethod to find the amount of clusters K that produces the best results. As theamount of clusters rises, the similarity between the clusters lowers, and finding thecorrect amount of clusters means finding a balance between minimising interclassand maximising intraclass similarity whilst keeping the amount of clusters as low aspossible. Selecting the best clusters is therefore subjective, and what clustering isperceived as better differs per person.

3.3 Applying Regression Analysis

After identifying the different clusters of nations with similar spending habits, theseclusters can be visualised in a regression analysis for each variable and each impactcategory. In these regressions, expenditure is the independent variable, while theimpact categories is the dependent variable. With twelve COICOP categories, and 4impact categories this results in 48 regressions models. Each model will be checkedfor its R2-value and its P-value. This research set the thresholds for significant cor-relation at R2 ≥ 0.90, and α ≤ 0.001. However, since the depth of the data set israther shallow (n = 30), a regression analysis of each cluster(n < 10) will not bestatistically significant. For this reason, the full data-set is used for each COICOPand each Impact category. A suggestive regression line will be shown however, inan attempt to show how the difference in correlation between the total data set andeach cluster.

Given the size of the data set, not much can be said about the shape of thecorrelation. As mentioned earlier, smaller data sets are more prone to being influ-enced by outliers, and these outliers can also suggest a logarithmic or other typesof correlation while this might not be the case at all. Therefore, all correlations areconsidered to be linear. For each regression model, a correlation with and withouta constant will be computed and compared using their r2-value, where the modelwith the highest value is accepted. The same method will be applied for each cluster.

Chapter 3 23

Page 40: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN
Page 41: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Chapter 4

Results

The following chapter describes the results by order of RO. The results are validatedcontextualised by means of a comparison with other literature in the Discussionfurther down.

4.1 FD per Capita of European Nations

The first RO aims to calculate the FD of all European Nations included in thisstudy. Using the method as described in the Method section, the FDs were calcu-lated, isolated, and contextualised in the 12 COICOP divisions. Of the 200 Exiobaseproducts, 177 products were classified in the twelve COICOP categories. The re-maining 27 products that could not be classified represent less than 1% of the total,and are therefore ignored. Figure 4.1 below shows the results of this method.

Figure 4.1: Boxplot of expenditure per COICOP Category in e

As Figure 4.1 shows, expenditure for housing is larger than any other category,but also shows a larger spread than the other categories. According to Figure 4.1the five categories with largest expenditure are, in order of size follow on the nextpage:

25

Page 42: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Results

1 Housing, Electricity, and other fuels

2 Food and Non-alcoholic Beverages

3 Transport

4 Miscellaneous goods and Services

5 Restaurants and Hotels

Appendix A.5 shows the mean, minimal amount, maximum amount, and stan-dard deviation of each category that serve as the data for Figure 4.1. When ex-pressing the expenditure per category in percentages, the order of categories fromlarge to small does not change. Figure A.1 in the appendix presents a boxplot ofthe expenditures in percentages.

4.2 Investigating Expenditure Habits

Using the calculated FDs from RO 1, the next step examined expenditure habitsthat differentiate types of nations from one another. First, the parameters takenfrom literature were optimised to ensure the best results. Table A.3 shows the start-ing parameters and the parameters used to create the finalised result.

As Table A.3, the starting parameters were not far off the optimal parameters,the rough and fine training cycles being the exception. The parameters were op-timised using the steps outlined in the method section, and over 100 models werecreated in the process. Argumentations for the parameters choices, especially forthe training cycles, is discussed further in the Limitations chapter.

The final model shows an ET of 0.51 and an EQ of 0.1 . To frame these values,the earlier values for the ET were double that amount. The resulting SOM showssome clear grouping of nations, as is shown in Figure 4.2a below.

Each square in Figure 4.2a represents a neuron, with the number in each squareshowing the number of nations that neuron represents. The colours correspond tothe numbers and show the hits for each neuron as a heat map, red being high andblue being low. It is interesting to see that the upper-left and lower-right neuronsrepresent 4 nations each, only one other neuron represents these many nations. Thedistance between these two hotspots is the largest it can be, insinuating that thesesets of nations differ the most when it comes to expenditure habits.

It must be noted that FIgure 4.2a does not give a representation of the actualdistance between the different neurons. Figure 4.2b below presents the U-Matrix ofthe SOM, which is a visual representation of the sum of distances in the dimensionalspace between the neurons. As such, it shows which neurons are close together andshould be grouped, and which are further away from each other. Figure 4.2b canbe read as a heat map, where redder areas show that neurons are closer, and moreblue areas show that neurons are further away from each other.

26 Chapter 4

Page 43: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Results

(a) BMU hits of the SOM (b) U-Matrix of the SOM

Figure 4.2: Clusters on the SOM grid and a map

Given the small size of the dataset, the squares representing each neuron arerather large, and therefore fail to give an easily obtainable image. What can be saidis that each corner shows a large spread due to its blue colour, meaning that thereare still rather large distances between the neurons in the areas where the neuronsrepresent more nations. One can therefore assume that if these neurons are to beclustered the spread for the different variables, in this case the COICOP categories,can still be rather large since the neurons in the corners still show significant dis-tance between them.

To further contextualise the SOM, Figure 4.3 below presents a heat map foreach COICOP category. As shown in the legend present in Figure 4.3 a red colourindicates a relatively higher percentage of expenditure, and blue indicating a rela-tively small percentage of expenditure for that category. The heat maps each haveseparate legends, each with different minimum and maximum values. As with allother SOM visualisations, they are expressed in % of total expenditure.

As Figure 4.3 indicates, the COICOP divisions with the clearest differences areFood, Housing, Education, and Recreation and Culture. While the other categoriesalso show differences, they are more spread out and show a less clear hotspot. Ex-amples of these categories are Restaurants & Hotels, Transport, and MiscellaneousGoods & Services. The categories that show a more clear separation between na-tions that spend more or less can be used to visually check the clustering quality.

4.3 Clustering the data

Now that the SOM algorithm reduced the dimensionality of the dataset, and ar-ranged the data according to their expenditure, applying K-Means on the BMUsrepresenting the nations results in clusters of nations that spend similarly. Divid-ing the data into more clusters results in higher intrasimilarity between points ina cluster while lowering intersimilarity between the clusters themselves. Dividingthe data in less clusters lowers the intrasimilarity between points in a cluster whileelevating the intersimilarity between clusters.

Chapter 4 27

Page 44: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Results

Figure 4.3: Heat map of each division in % of total expenditure per capita in anation.

The Elbow-Method allows for comparison of error values between the differentcluster amounts. Figure 4.4 below shows the SSE as a result of cluster-size. Figure4.4 shows a decrease in error at four, five, and six clusters, where moving to sevenclusters yields a significantly less decrease in error minimisation. Keeping in mindthat the total data set is 30 nations, six clusters would mean an average of 5 nationsper cluster, where 4 clusters would mean an average of 7.5 nations per cluster.

To determine what cluster amount produces the best result, Table 4.1 belowpresents the SSE and its decrease as more clusters are used for the K-Means al-gorithm. Moving from three to four clusters decreases the SSE with 15.9%, whilemoving from four to five clusters decreases the SSE with 10.2%. Increasing thenumber of clusters lowers the SSE even less. Boxplots of the clusters per COICOPcategory served as a tool for visual comparison between the clustering using fourand five clusters. Using four clusters showed to give the best results.

28 Chapter 4

Page 45: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Results

Figure 4.4: Plot of sum of squared error for the K-Means algorithm over variouscluster amounts

Table 4.1: SSE and its decrease per amount of clusters used

Cluster Amount SSE SSE Decrease (%)

1 2.506424446 -2 1.886025909 -24.83 1.422236695 -18.54 1.023873577 -15.95 0.767461389 -10.26 0.551068694 -8.67 0.448999597 -4.18 0.382131829 -2.79 0.32 -2.5

Visualising the clusters

Figure 4.5a presents a visual representation of these clusters on the SOM grid.It becomes clear that cluster 3 is significantly larger on the grid than the otherclusters, while it represents relatively fewer nations than the other clusters. This isan indication that this cluster is less homogeneous than the other clusters. Figure4.5b shows the clustering on a map for geological contextualisation, and an overviewof the nations per cluster are shown in table 4.2.

Table 4.2 shows that one cluster is significantly larger than the other ones; cluster1. This cluster represents Western-Europe, as well as Scandinavia. Cluster 0 mainlyincludes Eastern-European nations, with Malta being the exception. Cluster 2 is amixed bag, but mainly consists of Southern-European nations, except for Ireland.It is interesting to note that all nations have a coastline or are islands. Last, Cluster3 is the least cohesive cluster of the four, and represents more developed Eastern-European nations, as well as Italy. Figure 4.6 below shows the nature of theirexpenditure per COICOP category in percentages.

Chapter 4 29

Page 46: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Results

(a) Clusters shown on the SOM grid. (b) Clusters shown on a map.

Figure 4.5: Clusters on the SOM grid and a map

Table 4.2: Overview of the four clusters and their nations

Cluster 0 Cluster 1 Cluster 2 Cluster 3

Bulgaria Belgium Ireland Czech RepublicEstonia Denmark Greece ItalyLithuania Germany Estonia LatviaMalta France Croatia HungaryPoland Luxembourg Cyprus SloveniaRomania Netherlands Portugal SlovakiaTurkey Austria

FinlandSwedenUnited KingdomNorway

Figure 4.6 shows that the clusters do not differ significantly in every category,but that some categories are more defining than others. First, Cluster 0 spendsa higher percentage of its total expenditure on Food, Alcohol and Tobacco, andCommunication, while spending less on Health and Recreation & Culture. Sec-ond, Cluster 1 spends most on Health, Housing & Energy, and Goods & Services.On the contrary, Cluster 1 spends less on Food, Education, and Communication.Third, Cluster 2 spends more on Education, and Restaurants and Hotels, but rel-atively little on Furnishing & Household, and Recreation & Culture. Last, Cluster3 spends more on Furnishing & Household, and Recreation & Culture. Interest-ingly, the spread on Cluster 3 is smaller than the other clusters in most categories,and its expenditure is not significantly lower than the others in any of the categories.

It must be said that many clusters still show a significant spread in some cat-egories, and that some categories show to be a better indicator of clustering thanother categories are. Furthermore, given the smaller size of the dataset, one nationcan significantly influence the shape of the boxes in Figure 4.6.

30 Chapter 4

Page 47: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Results

Figure 4.6: Boxplot of clusters per COICOP category in % of total spending

The shape and position of the boxes as shown in 4.6 change when the boxes areexpressed in e spent. Figure 4.7 below shows the same clusters and categories asFigure 4.6, but substitutes % for e spent.

Figure 4.7: Boxplot of clusters per COICOP category in e

Comparing Figure 4.6 and Figure 4.7 shows that Cluster 1, while spendingroughly the same on some categories in %, spends significantly more in e . This isespecially true for Furnishing & Household, Recreation & Culture, and Transport. Italso shows that an important category like Food & Beverages shows higher expendi-ture for Cluster 0 and 3 in both % and in e . Most of these nations have a relativelyhigh PPS, which indicates a weaker economy and thus a lower average income.

Chapter 4 31

Page 48: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Results

Other categories, like Education show very little difference in the relative spreadof the boxes. Where Cluster 2 shows to have the highest expenses in Education in% of total expenditure, the same is true for its expenditure in e . Last, the Clothing& Footwear category shows very little change from expressing expenditure in % toe , an indication that the product prices in this category are very well adjusted forthe different economies.

In short, given that Cluster 1 consists of the strongest economies in Europe,the difference in expenditure becomes apparent when expressing expenditure in e ,instead of in %. Especially Cluster 0 and Cluster 3 show a relative decrease from %to e , and Cluster 2 stays relatively the same in comparison with the other clusters.

4.4 Applying Regression Analysis

The outcomes of the RO 1 and 2 can be combined and subjected to RegressionAnalysis to investigate whether there are correlations between the expenditure andthe impacts associated with the expenditure. Furthermore, By applying regressionto the different clusters, different correlations can be visualised. Using Statsmod-els (Seabold and Perktold 2018), the expenditure from each category served as theindependent variable, with each of the four impact categories as the dependent vari-able in an attempt to identify these correlations. Since there are twelve COICOPcategories, and four Impact categories, not all regression models can be shown.Therefore, the three largest categories, Food, Housing, and Transport, are shown inthe results. Appendix A.6 holds the visualisation of the regression models for allother categories.

Food and Non-alcoholic Beverages

Figure 4.8a below presents the regression for the four impact categories for Food.Of the four impact categories, the correlations between food expenditure and GWP,as well as food expenditure and Material Footprint is significant, as shown by theR2-value. The Land Footprint falls just below, and the Water Footprint is far fromthe threshold. It is interesting to see that that the four of the six points in Cluster2 are above the regression lines, and lower the R2-value significantly. Furthermore,an outlier from Cluster 1 has the highest impact in three of the four categories.

Housing, Water, Electricity, gas and other fuels

Figure 4.8b below presents the regression models for the four impact categories forHousing. None of the four impact categories show a correlation, in fact, the R2 valuear all 0.5 or below. The regression model of Cluster 2 for Land Footprint is the onlyregression line that shows a decline as the independent variable grows. This is dueto the one large outlier around e 1000 in Cluster 2. Given the low amount of pointsin each cluster, nothing can be said about those correlations, and so whether thismodel shows the actual correlation between expenditure and Land Footprint in theHousing category is cannot be stated with certainty.

32 Chapter 4

Page 49: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Results

Transport

Figure 4.9 below presents the regression models for the four impact categories forTransport. One of the four impact categories shows a significant correlation, Ma-terial Footprint in this case, with Land Footprint and Blue Water Consumptionbeing close to the threshold. There are some clear outliers, one nation from Cluster2 in GWP, and one nation from Cluster 1 in Land and Material Footprint. Theregression line for Cluster 0 is far above the average regression line in three of thefour impact categories, implying that the impact per e is higher in these nationsthan in the nations of the other clusters.

Figure 4.8: Regression of three largest expenditure categories

(a) Regressions for Food in Impact Categories

(b) Regressions for Housing in Impact Categories

Chapter 4 33

Page 50: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Results

Figure 4.9: Regressions for Transport in Impact Categories

4.4.1 Comparing the r2-values

Applying regression produced four models per category with twelve categories, 48models for the entire data set. As stated in the Method section, all r2-values of 0.9are considered significant, meaning that there is a correlation between expenditurein a given category, and that the expenditure in e explains 90% of the variancefor those impact categories. Of all the 48 impact categories, 15 categories showed asignificant correlation. This is equal to 31% of all the categories. Table 4.3 belowshows an overview of all r2-values, where the significant categories are in bold.

Table 4.3: Overview of the r2-values of each regression model, each with expenditurein that category and one of the four impact categories as the independent anddependent variable respectively

GWP(co2-eq)

LandFootprint (m2)

MaterialFootprint (kg)

Blue WaterConsumption (m3)

Food & Beverages 0.961 0.88 0.957 0.785Alcohol & Tobacco 0.955 0.881 0.918 0.713Clothing & Footwear 0.964 0.925 0.956 0.944Housing & Energy 0.416 0.5 0.444 0.446Furnishing & Household 0.924 0.923 0.953 0.933Health 0.878 0.865 0.801 0.731Transport 0.729 0.892 0.942 0.886Communication 0.81 0.796 0.881 0.898Recreation & Culture 0.779 0.813 0.942 0.917Education 0.588 0.662 0.703 0.754Restaurants & Hotels 0.898 0.81 0.814 0.708Goods & Services 0.892 0.852 0.859 0.825

34 Chapter 4

Page 51: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Results

Of the twelve categories, two show a significant correlation between expenditureand all four impact categories, these categories are Clothing & Footwear, and Fur-nishing & Household. Three other categories show significant correlation in two ofthe four categories, these categories are Food & Beverages, Alcohol & Tobacco, andRecreation & Culture. Last, the Transport category shows significant correlation forone impact category; Land Footprint. The other six categories do not show anycorrelation between expenditure and one of the four impact categories, when a min-imum of 0.9 is considered. All P-values are well below the threshold of 0.001.

The r2-values shown in table 4.3 are for the entire set of nations. The dottedregression lines shown in figures 4.8a, 4.8b, and 4.9 are for the nations in each re-spective cluster. As such, the r2-values of these regressions can also be checkedfor significance between the variables. There are twelve categories, each with fourimpact categories, as well as four clusters, resulting in 164 r2-values. Appendix A.7shows overviews of the R2-values and P-values for all nations combined, as well asper cluster. Table 4.4 below shows an overview of the rate of significance for eachcluster. It is important to add that since the size of each cluster is too small tohave statistical meaning, no claims can be made regarding the actual truth of thesignificance. However, what can be said is that where 31% of the regression modelsfor the entire data set show a significant correlation, that percentage grows whenlooking at the different clusters.

Table 4.4: Amount of significant r2-values per cluster

All Clusters Cluster 0 Cluster 1 Cluster 2 Cluster 3

SignificantCategories

1548

2248

2448

2648

3748

% of total 31% 44% 47% 50% 69%

As Table 4.2 shows, Cluster 1 is by far the largest cluster, and Cluster 2 and 3are the smallest. Table 4.4 shows that the smallest clusters have the highest rateof significance, which is understandable because making a model fit with less pointsis easier since an outlier is harder to determine. Therefore it is interesting to seethat Cluster 1 also shows an improved rate of significance over taking all clusterstogether. While all clusters show improvement in their rate of significance overa combination of all clusters, there are still a few outliers that are interesting todiscuss.

4.4.2 Identifying Outliers

As the regression models show, there are some significant differences between theimpacts per e of the different impact categories. For instance, the higher impactper e for nations in Cluster 0 for Alcohol & Tobacco is due to higher consumptionof Tobacco in those areas.

Chapter 4 35

Page 52: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Results

The unreliable results in the Housing category are mostly due to the differentenergy mixes that each nation provides for its citizens. The data extracted fromExiobase show that Eastern-European nations especially produce larger portions oftheir energy through fossil fuels, and therefore show significantly higher footprintsper e spent.

The Recreation & Culture category shows nations from Cluster 1 to spend themost. The outlier in GWP is Luxembourg, while the outlier for Land Footprint isFinland. Last, GWP shows another outlier for this category; a nation from Cluster2. This nation is Greece, Producing the second-most GWP for this category.

The GWP model for Transport shows another interesting outlier; Cyprus, fromCluster 2, has an expense of e 1665.29, and a GWP of 5634.67 kg. This GWP ismore than double the second-largest GWP producer in this category, Norway, whileexpenditure is less than half of what Norway’s expenses. The reason for this is thatCyprus is an island, and makes more use of naval transport and air transport to getoff the island.

However, there are also things that are harder to explain. The regression modelsfor Education show significant differences between the different systems, especiallyfor Land- and Material Footprint. The Education category only holds one Exiobaseproduct; Education Services. Therefore, it is difficult to assess why these impactsare so different per cluster and per country. Assumptions can be made regarding theuse of electronic material versus paper material, and the education system’s relianceon the energy mix of the nation in which it is active.

4.5 Sensitivity Analysis

Some of the discussed limitations are directly connected to how sensitive the resultsof this study are to the data and how it is used. The two most important issues areaggregation in Exiobase, and how the Exiobase Products were divided into COICOPcategories, and these issues are connected in the sense that a more aggregated prod-ucts will result in more linear correlations between impacts and expenditure.

To contextualise this, Appendix A.4 shows the largest COICOP category is Hous-ing, with an average spending of e 2826,-, and a standard deviation of e 1065,-,which is 37% of the average. All other categories show a standard deviation thatis around the same ratio of mean to standard deviation, except for Food. However,when looking at the r2-values of all the regressions in table 4.3, Housing is the lowestof all categories. Judging from the expenditures and the standard deviation, onewould assume that Housing would have similar error-values as the other categoriesin the regression, but it does not. This is likely due to the fact that the Housingcategory houses the most Exiobase products, and is thus dependent on a large vari-ety of impacts. The housing category includes energy, waste, and general housing,and all these products or services vary greatly in how they are handled per nation.

36 Chapter 4

Page 53: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Results

As shown in Figure 4.8b, all eastern European nations spend relatively littlecompared to the other nations, while their impacts are far higher than most. Thishas to do with the energy mix in those nations, which are very different and morereliant on coal than the nations with stronger economies that are more developed.

This is one issue where a more disaggregated breakdown of the Exiobase productsinto lower-level COICOP categories would show that it is energy specifically thatinfluences the impacts the most, and shows one of the drawbacks of the method thisstudy applies. Categories that already are quite specific, like Health, and Recreation& Culture, show a more significant linear correlation between impacts and expendi-ture, showing that COICOP categories that house less Exiobase Products are morelinear, and therefore show higher r2-values.

Chapter 4 37

Page 54: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN
Page 55: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Chapter 5

Discussion

Choices made during the production of the results shown in Chapter 4 influencedthe results in many different ways. This chapter aims to highlight some of thesechoices, some points for improvement and the reliability of some of the results inthree parts; the method, the quality of the data, and the results and how theseresults were influenced by the first two parts.

5.1 The Methodology

Since the Methodology contains different techniques to produce the results, thesetechniques are discussed one by one. First, using the Exiobase EEMRIOT in com-bination with Pymrio enabled the creation of a custom set of data, based on numberstaken from various sources with reliable data. On the other hand, as discussed pre-viously in the Theory, EEMRIOTs significantly lag behind in terms of time-frame.The latest data for Exiobase is 2011, and this version was released in 2018. Sevenyears can significantly change the data reliability, but this is discussed in the sectioncovering the data.

The implementation of the SOM algorithm showed to be a computational-powerheavy technique, and this limited the level of experimentation possible on the ma-chine used. Especially the number of training cycles had to be lowered significantlyin order to produce results within time. Therefore it is not certain whether theparameters used in this research offer the best results. This is especially true sincethe SOM algorithm does not contain a clear convergence statement. When furtherexpanding this research with larger data-sets, the necessary computational-powermust be taken into account in order to produce satisfactory results.

Furthermore, given the time constraints for this research, investigating differ-ent software packages was out of the question, and gaining computation efficiencyis possible there. Not only does Python support multiple SOM software packages,other programming languages such as R also offer well-documented software options.These packages could better utilise the computational powers at hand or specific vi-sualisation desires.

39

Page 56: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Discussion

For clustering, this report used the K-means clustering algorithm mainly sinceit is a simple, yet effective algorithm for smaller and simpler datasets. However, awide array of clustering techniques exist in the same package this research used, noneof which this research explored. These clustering techniques each posses differentcharacteristics and could improve the level of clustering. That said, given the size ofthe dataset, K-means has proven to be adequate. Larger data-sets offer the depthnecessary to investigate different clustering techniques.

5.2 The Data

Since all the data used in this research comes from one source, this is both a strong,as well as a weak point. First, all the data is carefully selected and covers thesame range of products and nations, meaning there are no gaps between differentcountries and products. Second, since Exiobase utilises many different sources forits data collection, what is available is highly reliable for its timeframe. However,the time gap between the represented data and the time of publication raises somequestions regarding the reliability of the data; in some areas seven years can meansignificant changes, leading to results that do not represent the current situationproperly.

Second, the data used in Exiobase varies in its aggregation, which dictates howCOICOP categories are represented through the Exiobase products. This createssome categories with simple linear relations between its expenditure and impactsince only a few Exiobase products represent the COICOP category, while othercategories are represented by far more Exiobase Products. This is especially truefor Health and Education, where both COICOP categories are represented by onlyone product, while COICOP categories like Transport and Housing contain far moreproducts.

The representation of the data is further complicated by the fact that it onlyrepresents household consumption. As mentioned in the Methodology, there aremultiple Final Demand categories in Exiobase, and this research only focuses onHousehold Consumption. Categories like Health and Education are financed differ-ently throughout Europe; where some nations offer free education financed throughtaxes, others do not. Therefore, household consumption of Health and Educationcan differ significantly, not solely because these categories are cheaper in some areas,but also because the financing structure differs significantly. Including the FD ofgovernments could change the results for these categories, and even out the fieldmore.

40 Chapter 5

Page 57: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Discussion

5.3 The Results

Food, Housing, and Transport show the most important categories for expenditureand environmental impact. This is in line with other literature (Ivanova, Stadler,et al. 2016) Within these categories, levels of meat consumption, the energy mix,and how transport is undertaken make the biggest differences. Tukker et al. (2014)already showed how significant the impact per e of coal is, and Beylot et al. (2019)shows that meat consumption and transport contribute significantly to the environ-mental impacts of households. It must be said that (Beylot et al. 2019) includesgovernmental FD, not only household consumption.

As shown in the results, some areas of expenditure vary greatly between Euro-pean nations. Especially in the Housing category, differences in expenditure are verylarge. However, regression analysis showed that nations where per capita expendi-ture was relatively lower, did not necessarily show lower environmental impacts percapita. As aforementioned, the COICOP categories vary the number of Exiobaseproducts they represent. Furthermore, an EEMRIOT allows different nations tohave different product ratios in their consumption.

For instance, energy demand for the house is included in the Housing cate-gory, and all nations have different energy mixes. These different types of energyproduction all have different prices per unit of energy produced, but also differentenvironmental impacts associated with them. This is clearly shown in Figure 4.8b,where nations from Cluster 0, and Cluster 3 have a much lower expenditure fortheir energy demand, but the associated impacts are all far above the average thatthe regression line suggests. These results stem from the high percentage of coalused in the energy production of these nations. This is in line with Sommer andKratena (2017), explaining the difference between lower and higher income as asmall Kuznets Curve. The results of this study confirm this.

Evaluation of the number of Exiobase products per COICOP category showedthat the categories with higher r2-values consisted of less Exiobase products. Italso showed that categories that are dominated by international brands show lessvariation in impact per e than categories that are managed on a national level.To further contextualise this, COICOP categories like Clothing & Footwear andFurnishing & Housing are much more even since the products in these categoriesare more similar. Enterprises like H& M and IKEA dictate large portions of theproduct sales in these categories, and globalism allowed these enterprises to produceproducts efficiently so they can serve entire continents with the same products. Onthe contrary, COICOP categories like Housing and Transport are managed morelocally, and as such show more differences from nation to nation.

Chapter 5 41

Page 58: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Discussion

5.4 Limitations

The method for producing these results is subject to some issues that influence theresults, as well as some issues that change how the results can be interpreted. Thischapter discusses the limitations of the study, as well as uncertainties and a sensi-tivity analysis.

First, a limitation that influenced the entire structure of the study, is the prod-ucts of Exiobase. As mentioned in the Theory, Exiobase includes 200 products.These products are tailored more towards economic activity, and less towards con-sumer consumption. As such, some products are very specific, while others aremore all-encompassing. For instance, there are various types of waste disposal, butthere is only one product that represents all fruits, vegetables, and nuts. Because ofthis, and the way the COICOP system works, this study decided to only focus onthe COICOP divisions instead of using sub-levels that further distinguish the exactproduct or service that consumers spend their money on.

Second, Mapping the Exiobase Products to COICOP products also posed achallenge, since some Exiobase products could be of use in different COICOP cate-gories. There are multiple sources available that aim to express consumer expendi-ture in COICOP categories in Exiobase products (Froemelt, Durrenmatt, and Hell-weg 2018a; Ivanova, Stadler, et al. 2016; Steen-Olsen, Wood, and Hertwich 2016),information on how to map divide all Exiobase products in COICOP categories wasmore scarce. Since the division of Exiobase Products directly influences the impactsthat the COICOP categories represent, these choices influence the outcomes of thestudy significantly. How the products influence the outcomes is described further inthe Sensitivity Analysis section.

Third, the results of the SOM are also highly influenced by a lack of compu-tational power and time. As shown in the Method chapter, the suggested amountof cycles is far above the cycles used in the final SOM. Given the high demand forcomputational power, optimising the parameters involved building multiple mod-els, each with different parameters, and doing this for the suggested amount ofcycles would simply take too much time. While Froemelt, Durrenmatt, and Hellweg(2018b) suggests that the amount of cycles is not the most important parameter,it could have significantly changed the outcome. This is especially true since a 2DSOM does not have a convergence statement (Tan and George 2005). The outcomeof the SOM directly influences how the nations are clustered, and could thereforehave a ripple effect and also influence the regression results for the clusters.

Fourth, Exiobase expresses its demand in millions of e basic price. This meansthat no taxes are applied to any of the products, which makes sense given that eachnation has a different tax structure. However, when comparing the FD of differentnations from Exiobase, these are in basic price, meaning that while one nation mightpurchase less of a product, they might actually pay more because the VAT on thatproduct is higher than in other countries.

42 Chapter 5

Page 59: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Discussion

Last, it must be stated that a certain amount of expenditure in a category, doesnot account for a certain number of products within that category, and as such doesnot always correlate with impact. As an example, prices for a T-shirt can differgreatly among brands, but while price differences between a cheap or an expensiveT-shirts could be eight times, the difference in impact is unlikely of being the samedifference. As a result, consumer product choices are poorly represented as a resultof the method of this study.

Adding VAT to the results of this study was explored but deemed too com-plicated because of how different the tax structures of nations are, and that someExiobase products group different consumer products together that have differenttax rates. An example of this is the Fruits, vegetables, and nuts product in Ex-iobase. Some nations have different taxes on vegetables and fruits but not on nuts,and thus it is impossible to apply the correct VAT to that product.

Apart from the lack of VAT, the prices presented in the results are in 2011 prices,and are therefore not representative of spending in 2019. According to SCB (2019),Sweden experienced 5.4% inflation from 2011 to 2018, meaning that all prices shownin this report are 5.4% lower, while also missing the VAT. The VAT in Sweden isbetween 0% and 25% for most products, where Food is usually lower. Other prod-ucts included in Housing are closer to 25%, and therefore should be multiplied by1.054 ∗ 1.25 = 1.32 to be closer to 2018 prices.

Chapter 5 43

Page 60: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN
Page 61: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Chapter 6

Conclusion

Mapping the expenditure habits of 30 different nations within Europe showed that,while there are trends regarding expenditure that hold throughout all nations, largedifferences between what is spent most on, and how large these expenditures are asa ratio of the total expenditure, exist between nations. The largest categories areFood, Housing, and Transport, as other literature also suggests. Clustering thesenations in groups of similarly behaving nations showed a difference between West-and Eastern-Europe, as well as coastal Europe. These differences were visible notonly in total expenditure but also on how these expenditures are divided among thedifferent categories.

When taking the entire pool of nations, only two out of twelve categories showeda significant correlation between expenses and impacts for the four used impact cat-egories. Four other categories showed a significant correlation between expenditureand impact in one or two impact categories. In total, expenditure explained over90% of the variance between expenditure and impact for 15 out of the 48 testedimpact categories, a total of 31%. These results show that, in general, consumerexpenditure is not a valid proxy measurement for environmental impact.

The categories that did show a significant correlation in all impact categoriesare Clothing & Footwear, and Furnishing & Household. Food & Beverages, Alcohol& Tobacco, and Recreation & Culture showed significance in two impact categories,and Transport showed correlation in one category. This report shows that, whileusing Exiobase as the only source, there is a linear correlation between expenditureand environmental impact for these categories. If the nation of residence is known,expenditure is a better indicator of environmental impact, up to 69% of all testedimpact categories depending on the nation of residence using the regression modelof the respective cluster the nation of residence belongs to. It is important to notethat, due to the size of the clusters, these numbers are not statistically significant.

45

Page 62: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Appendix

The results of the clustering also show that some clusters of nations show sig-nificantly higher impact per e than other clusters do for certain categories. This isespecially true for Housing, where a difference in energy mix between clusters showsa stark difference in impact per e between nations from Cluster 0 as opposed to theother clusters. The differences in slope between the regression lines in Figure 4.8bindicate that the differences in impact per e between clusters are largest here. Theslopes of the regression lines per cluster are an indication of which categories arerelative impact hotspots for certain nations, and therefore show governing bodieswhat consumption category demands attention.

These results are hampered by certain caveats in the methodology, as mentionedin the Discussion. First, a significant time-lag between the time of this research andthe time represented by the data exists. Second, the results expressed in monetaryterms are in basic price, while also experiencing time-lag. Therefore, the pricesshown in this report do not represent correct impact per e in 2011 prices, the yearthe data stems from, or present day. Second, some of the products in Exiobase aremore elaborate and unaggregated than other areas, and therefore lack preciseness.Where some areas are divided over different products, like waste management, otherproducts are very all-encompassing and thus less focussed. Nonetheless, the resultsof this study show that Exiobase is an interesting tool, with various capabilities.Combining IOT and DM allows for the unearthing of interesting details and showspromise for further exploration.

6.1 Future Work

The scope of this research is limited given the timeframe and manpower. This showsin the results, and leaves room for expansion and specialisation. First, this researchfocuses only on 2011 as a timeframe, while the Exiobase data spans from 1995 to2011. Implementing a time-series would allow the tracking of the expenditure percapita of different nations over time, allowing for better predictions of future con-sumption habits. As shown by Barreto (2007), combining the SOM algorithm andtime-series is possible and accurate.

Second, the choice of impact characterisation method could improve the results aswell. The four impact categories used in this research are better suited for communi-cating the results with people that have less knowledge of environmental impact, butas discussed in Impact Categories & Characterisation, this characterisation modelcan be improved on. As mentioned by (Steinmann et al. 2018) including other im-pact categories would broaden the range of its coverage, but adding categories woulddecrease the ease of communication.

46 Chapter

Page 63: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Appendix A

Appendices

A.1 Software Packages used

Table A.1 below presents the Python version as well as the packages used and theirversion. At the start of the project, Python and R were compared to find the besttools for this project. While R included more elaborate documentation on SOM-related packages, the Pymrio package was written specifically for use with Pythonand Exiobase. Furthermore, while the author is experienced with both languages,the author has more and more recent experience with Python. Since all used dataoriginates from Exiobase, the author opted for using Python to ensure smooth datamanipulation across the different ROs.

Table A.1: Software Packages used

Name Version Source

Python 3.7.3 Van Rossum and Drake (2011)Matplotlib 3.1.0 Hunter (2007)Numpy 1.16.4 Walt, Colbert, and Varoquaux (2011)Pandas 0.24.2 McKinney (2010)Pymrio 0.3.8 Stadler (2014)Seaborn 0.9.0 Michael Waskom et al. (2018)SOMPY 1.0 Moosavi and Packman (2019)Statsmodels 0.90 Seabold and Perktold (2018)

47

Page 64: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Appendix

A.2 Nation Codes and Population

Table A.2: Overview of all included nations, their population and PPS.

Country Code Country Name Population in 2011 PPS

AT Austria 8,375,164 105.4BE Belgium 11,000,638 109.1BG Bulgaria 7,369,431 51.1CY Cyprus 839,751 94.5CZ Czech Republic 10,486,731 73.5DE Germany 80,222,065 102.9DK Denmark 5,560,628 139.0EE Estonia 1,329,660 72.8GR Greece 11,123,392 95.7ES Spain 46,667,174 98.1FI Finland 5,375,276 120.7FR France 64,978,721 109.4HR Croatia 4,289,857 71.5HU Hungary 9,985,722 61.9IE Ireland 4,570,881 118.9IT Italy 59,364,690 101.5LT Lithuania 3,052,588 64.1LU Luxembourg 511,840 120.1LV Latvia 2,074,605 71.2MT Malta 414,989 79.2NL Netherlands 16,655,799 108.9NO Norway 4,920,305 155.5PL Poland 38,062,718 57.7PT Portugal 10,572,721 86.0RO Romania 20,199,059 54.0SE Sweden 9,415,570 123.8SI Slovenia 2,050,189 84.4SK Slovakia 5,392,446 70.0TR Turkey 73,722,988 60.9GB United Kingdom 63,022,532 111.0

48 Chapter A

Page 65: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Appendix

A.3 The SOM Algorithm

For the sake of clarity, and since the original algorithm applied sequential training,this method will be explained first to further explain the working of the algorithm.How Batch training differs will be explained after.

Sequential Training

As mentioned earlier, with sequential training the algorithm iterates over all datapoints one by one, randomly selecting them to compare their IVs with the WVsof every node in the SOM. First, the WV of the nodes are compared to the IVof the data. The comparison takes place by calculating the distance between everynode and the IV in question, the Euclidian distance is often used here. For eachdata point, every node and its WV are compared to the value of the IV, and thenode whose WV is most similar to the value of the IV is considered the BMU. TheBMU is the node with the most similar WV to the data point’s IV and thus closestto the data point. Equation A.1 below shows the denotation of the BMU c with anIV x (Vesanto 2000, p.7).

‖x - mc‖ = mini{‖x - mi‖} (A.1)

Chapter A 49

Page 66: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Appendix

Second, the BMU, is moved closer to the value of the data point by changingits WV using an update rule so it more closely resembles the IV of the data pointas a result. The design of the update rule allows for some modification of how theSOM is updated. The basic SOM update rule for the BMU and its neighbours isdisplayed below as Equation A.2.

mi(t+ 1) = mi(t) + α(t)hci(t)[x(t)−mi(t)] (A.2)

with:

mi(t+ 1) : the new WVmi : the original WVt: : timex(t) : IV of the data point drawn at time t.α : learning ratehci : the neighbourhood kernel surrounding BMU c

The learning rate α is a factor that influences how much the new WV of theBMU is influenced by the IV and usually decreases in size with each step in time t.As shown in equation A.1, hci is the neighbourhood kernel for the surrounding nodes,and decides how far away a node can be from the BMU and still be updated, and howinfluential that update would be. A special feature of the original SOM algorithmis that as the algorithm iterates over the data, the size of the neighbourhood kernelshrinks and influences less and less neighbouring nodes as a result. The updatekernel can be smooth in nature, meaning that it decreases at a set rate as thedistance between the node in question and the BMU grows larger, often Gaussianin shape, or set. Equation A.3 shows an often-used expression for a Gaussian kernelbased on the distance between the BMU and the neighbouring nodes.

hj,i(x) = exp

(−d2j,i2σ2

), j ∈ D (A.3)

where dj,i = ‖rj− ri‖2 where rj and ri are the vector of the excited neighbour j andthe BMU i respectively. The topological parameter σ is the width of the neigh-bourhood kernel that decides whether a neuron will be included or not. Over time,the value of σ decreases per time-step of the algorithm, making the neighbourhoodkernel shrink as a result. Equation A.4 shows the decay function for the topologicalparameter σ, where it decays over each time-step t.

σ(t) = σ0exp

(− t

τ1

), t = 0, 1, 2, ..., (A.4)

where t is time, and τ1 is the amount of iterations, chosen by the researcher. Moreinformation about how to set τ1 can be found in Haykin (2009, p.432-435)

As such, combining Equation A.3 and Equation A.4 leads to the final equationA.5 shown below. Equation A.5 is the neighbourhood function that describes thesize of the neighbourhood kernel, and thus whether a node neighbouring the BMUreceives an update on its WV or not.

50 Chapter A

Page 67: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Appendix

hj,i(x) = exp

(−

d2j,i2σ2(t)

), t = 1, 2, ..., (A.5)

Where t is the number of iterations. In Equation A.5 the size of the neigh-bourhood hj,i decreases exponentially as σ(t) shrinks at an exponential rate as thenumber of iterations t grows larger, shown in Equation A.4.

The neighbourhood kernel can also be based on pre-defined neighbourhood areas.For example, there can be three areas; one housing the BMU, with the largest valuefor hci, one with its direct neighbours, and one with its secondary neighbours. Howlarge the update kernel for these neighbourhoods is, is up to whoever implementsthe algorithm (Teuvo Kohonen 2001).

As shown in Equation A.2, the neighbourhood kernel is multiplied by the learningrate α. Like the neighbourhood kernel, the learning rate can be both static anddynamic in nature though a dynamic rate is advised. The value of the learning rateα should be between 0.1 and 0.01 at all times, equation A.6 shows a formula fora dynamic learning rate α (Haykin 2009; Teuvo Kohonen 1990; T. Kohonen 1997,p.434-435).

α(t) = α0exp

(− t

τ2

), t = 0, 1, 2, ..., (A.6)

Where t is discrete time and τ2 is a time constant. To accommodate the sizeconstraint of the learning rate α, the time constant τ2 needs to be decided on care-fully.

The BMU and the nodes that fall within its neighbourhood as defined by equa-tion A.5 are moved closer to the position of the IV. Depending on the design of theaforementioned NF, the BMU might move a larger distance, while the movementof the neighbours is determined by the distance they have between themselves andthe IV. The process is then repeated for the next data point (Vesanto 2000, p.7-9).

Nodes that are more similar will be more grouped together as the algorithmiterates over the data as a result, creating clusters of nodes with similar WVsthat emulate data with similar properties. Over time, the SOM changes shape toaccommodate the moving nodes, and organise itself to represent the data it was fed.

Batch Training

Similar to Sequential training, Batch training involves iterating over the data inmultiple epochs to train the SOM. Where Batch training differs, is in how it iteratesover the points in the data set. Instead of iterating over each point one by one, Batchtraining presents the entire data set and its IVs to the SOM as a batch before anychanges to the WVs are made. To identify which IV is closest to which node, thedata set is divided into Voronoi regions. The IV that is used to update the nodes’WV is the weighted average of all IVs that fall within the region of the node inquestion. Therefore, a node can be the BMU of several different IVs per epoch atonce. The BMU c for a data point xj in Batch training is denoted by equation A.7below.

Chapter A 51

Page 68: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Appendix

c = argmink|xj −mk| (A.7)

The new WVs are updated using equation A.8 below.

mi(t+ 1) =

∑nj=1 hhc(t)xj∑nj=1 hic(t)

(A.8)

The updated WV is the weighted average of all data points, where the weightof the data point is dependent on the distance between itself and the BMU. Sincethe weight of the points is based on their distance from the BMU, no learning rateα is necessary Froemelt, Durrenmatt, and Hellweg (2018b).

Training Phases

Training the SOM is either a one-stage or two-stage process. In the single-phaseprocess, the neighbourhood size shrinks in a single training session to deliver itsSOM. In the two-stage process, the training sessions are split between a trainingsession with a large neighbourhood surrounding the BMU, and a fine-tuning phasein which a smaller neighbourhood with a smaller impact for neighouring nodes.Haykin (2009, p.434-435) defines these two phases as the ”Self-organizing orderingphase”, and the ”Convergence phase”.

An iteration over all data points in the data set is called an epoch, and the train-ing of a SOM usually involves many epochs. During each new epoch, the learningrate α is lowered so that the position of the BMU and its neighbours are affectedless and less until the algorithm converges (Kind and Brunner 2014; Ehsani, Quiel,and Malekian 2010; Titterington 2010; Teuvo Kohonen 1990).

The Self-organising Phase

During the first phase, the size of the neighbourhood should be very large, close thesize of the map itself. The learning rate is also larger during this first phase thenthe rate used in the second phase. According to Kohonen, this first step takes up to1,000 iterations or more depending on the size of the map and the data set. As thealgorithm iterates over the data, the size of the neighbourhood kernel influencingthe surrounding nodes shrinks quickly to only a small amount of nodes surroundingthe BMU, or only the BMU itself. Due to the large neighbourhood function andlearning rate, the impacts are more significant for the first iterations.

The Convergence Phase

The second phase is used for fine-tuning the map, and therefore aims to make smalleradjustments to the features of the SOM. As such, both the neighbourhood functionand the learning rate are smaller during this second phase. As the name of thisphase suggests, the algorithm aims to reach convergence during this phase. While(Tan and George 2005) raise an issue with the SOM algorithm; its lack of proof forconvergence.

52 Chapter A

Page 69: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Appendix

However, the same authors mention that enough training cycles can lead to high-quality maps regardless of convergence. Haykin (2009, p. 435) states that theconvergence phase should be at least 500 times the number of nodes in the network.While the learning rate should be lower than in the first phase, it is important thatthe rate will not reach zero, as this might leave the SOM in a meta-stable state,where the map has some topological error. Furthermore, the neighbourhood func-tion should also be smaller than during the first phase, only including the closestneighbours or only the BMU. It is important to note that the Batch Training vari-ant of the algorithm does not implement a learning rate.

Parameters

Since the SOM algorithm lacks a clear convergence statement, the best way toform high-quality maps is through fine-tuning the parameters used in the algorithm.There are multiple important parameters that require fine-tuning as was mentionedin the Method section in the main body of the text. Less important but still signifi-cant parameters are type of neighbourhood, initialisation, and map ratios. Many ofthe parameters mentioned above are dependent on one or multiple of the more orless important parameters mentioned above. Another important factor for findingstarting parameters is the size of the data-set N . Often, the amount or nodes isdependent on this and literature has shown that the number of nodes should bebetween 4 and 10, where 8 and 10 are often used as a starting parameter (Froemelt,Durrenmatt, and Hellweg 2018a; Tan and George 2005; Vesanto 2000).

Table A.3: Overview of Parameters, their suggested values, and their optimisedvalues

Parameter Starting Value Optimised Value

Map ratio 5*6 5*6Number of Neurons 5 ∗

√n (28) 30

Coop Radius in 6 6Coop Radius out 4 2Conv Radius in 4 2Conv Radius out 1 1Coop Sessions > 1000 150Conv Sessions 500*n 300Neighbourhood Type Gaussian GaussianInitialisation PCA PCA

Chapter A 53

Page 70: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Appendix

A.4 Boxplot of expenditure

Figure A.1: Boxplot of expenditure per COICOP Category in %

54 Chapter A

Page 71: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Appendix

A.5 Distribution of Expenditure per COICOP Cat-

egory

Table A.4: Overview of the expenditure per capita in euro, showing the mean,standard deviation, and the spread of the first six categories

CP01 CP02 CP03 CP04 CP05 CP06

mean 1656.99898 105.09477 391.86542 2826.01903 468.81101 364.50184std 265.51731 54.70232 174.36345 1064.71280 184.60905 205.69442min 1210.53948 26.28424 193.78362 1190.91633 203.92681 16.9660825% 1491.32991 66.23622 334.05428 1939.08091 318.35659 193.3381350% 1644.66279 107.38174 351.51537 2693.57392 460.34637 295.4885375% 1821.05249 133.00709 405.24024 3857.45031 607.69740 542.71601max 2300.62074 241.34982 981.13282 4822.49351 856.10867 814.29159

Table A.5: Overview of the expenditure per capita in euro, showing the mean,standard deviation, and the spread of the last six categories

CP07 CP08 CP09 CP10 CP11 CP12

mean 1425.04561 442.53546 803.62685 152.38294 1004.86320 1288.98942std 519.62551 114.25500 292.44774 101.54351 601.96588 538.58343min 584.53485 234.16658 219.86486 38.49032 175.74956 388.3766725% 957.47332 363.90883 589.83725 90.91173 539.87510 986.3512650% 1413.89236 427.35644 756.80665 123.22674 840.21015 1205.9991775% 1760.95777 483.55290 991.60033 178.03869 1440.25806 1581.91483max 2541.35303 712.10777 1360.36539 452.55053 2441.86493 2669.78691

Chapter A 55

Page 72: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Appendix

A.6 Regression Models

Figure A.2: Regressions for Food in Impact Categories

Figure A.3: Regressions for Alcohol & Tobacco in Impact Categories

56 Chapter A

Page 73: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Appendix

Figure A.4: Regressions for Clothing & Footwear in Impact Categories

Figure A.5: Regressions for Housing in Impact Categories

Chapter A 57

Page 74: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Appendix

Figure A.6: Regressions for Furnishings & Housings in Impact Categories

Figure A.7: Regressions for Health in Impact Categories

58 Chapter A

Page 75: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Appendix

Figure A.8: Regressions for Transport in Impact Categories

Figure A.9: Regressions for Communication in Impact Categories

Chapter A 59

Page 76: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Appendix

Figure A.10: Regressions for Recreation & Culture in Impact Categories

Figure A.11: Regressions for Education in Impact Categories

60 Chapter A

Page 77: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Appendix

Figure A.12: Regressions for Restaurants & Hotels in Impact Categories

Figure A.13: Regressions for Goods & Services in Impact Categories

Chapter A 61

Page 78: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

App

endix

A.7 Statistical Values

The tables below show the R2-values and P-values for all nations, as well as each cluster separately. As stated in the report, the R2-valuethreshold is set at 0.9, while the P-value threshold is set at 0.01

Table A.6: R-values and P-values for all nations

GWP(co2-eq)

LandFootprint (m2)

MaterialFootprint (kg)

Blue WaterConsumption (m3)

Division R-Value P-Value R-Value P-Value R-Value P-Value R-Value P-Value

Food & Beverages 0.961142 5.28E-22 0.880245 6.73E-15 0.957984 1.64E-21 0.78532 3.36E-11Alcohol & Tobacco 0.955403 3.90E-21 0.881487 5.78E-15 0.918104 2.67E-17 0.731492 8.91E-10Clothing & Footwear 0.96426 1.57E-22 0.925888 6.25E-18 0.956791 2.46E-21 0.944526 9.29E-20Housing & Energy 0.416438 8.86E-05 0.500502 8.57E-06 0.44493 4.17E-05 0.446855 3.95E-05Furnishing & Household 0.924237 8.61E-18 0.923931 9.13E-18 0.953469 7.23E-21 0.933892 1.19E-18Health 0.8782 8.61E-15 0.865782 3.54E-14 0.801951 1.03E-11 0.731747 8.78E-10Transport 0.729098 1.01E-09 0.892039 1.49E-15 0.942093 1.73E-19 0.886609 3.04E-15Communication 0.810277 5.52E-12 0.796279 1.56E-11 0.881522 5.76E-15 0.898589 5.98E-16Recreation & Culture 0.779507 4.97E-11 0.813448 4.32E-12 0.942704 1.49E-19 0.917671 2.88E-17Education 0.58892 4.73E-07 0.662799 2.53E-08 0.703907 3.74E-09 0.754859 2.35E-10Restaurants & Hotels 0.898936 5.69E-16 0.810478 5.44E-12 0.814085 4.11E-12 0.708111 3.03E-09Goods & Services 0.892192 1.46E-15 0.852441 1.41E-13 0.859181 7.13E-14 0.825628 1.61E-12

62C

hap

terA

Page 79: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

App

endix

Table A.7: R-values and P-values for Cluster 0

GWP(co2-eq)

LandFootprint (m2)

MaterialFootprint (kg)

Blue WaterConsumption (m3)

Division R-Value P-Value R-Value P-Value R-Value P-Value R-Value P-Value

Food & Beverages 0.966395 1.20E-05 0.931497 0.000103 0.962319 1.70E-05 0.654972 0.014952Alcohol & Tobacco 0.964267 1.45E-05 0.86807 0.000756 0.955358 2.83E-05 0.874196 0.000654Clothing & Footwear 0.970114 8.44E-06 0.953706 3.16E-05 0.964993 1.36E-05 0.923309 0.000145Housing & Energy 0.638905 0.017287 0.601904 0.023642 0.51864 0.043928 0.706152 0.008997Furnishing & Household 0.959078 2.18E-05 0.860514 0.000897 0.8953 0.000374 0.921478 0.000156Health 0.852232 0.00107 0.598686 0.024264 0.859717 0.000912 0.829342 0.001664Transport 0.914285 0.000203 0.871751 0.000694 0.884079 0.00051 0.903063 0.000296Communication 0.841856 0.001317 0.623822 0.019706 0.859152 0.000924 0.753051 0.00522Recreation & Culture 0.910713 0.00023 0.940324 6.80E-05 0.95913 2.17E-05 0.960608 1.94E-05Education 0.370752 0.109123 0.67266 0.012651 0.726182 0.007209 0.652593 0.015282Restaurants & Hotels 0.790031 0.003154 0.639581 0.017184 0.851225 0.001092 0.446191 0.070233Goods & Services 0.963335 1.56E-05 0.965932 1.25E-05 0.978573 3.10E-06 0.939844 6.96E-05

Chap

terA

63

Page 80: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

App

endix

Table A.8: R-values and P-values for Cluster 1

GWP(co2-eq)

LandFootprint (m2)

MaterialFootprint (kg)

Blue WaterConsumption (m3)

Division R-Value P-Value R-Value P-Value R-Value P-Value R-Value P-Value

Food & Beverages 0.979308 9.42E-10 0.856167 1.62E-05 0.953987 5.18E-08 0.819966 5.05E-05Alcohol & Tobacco 0.986296 1.20E-10 0.949435 8.31E-08 0.975513 2.19E-09 0.711544 0.000564Clothing & Footwear 0.981704 5.08E-10 0.972772 3.73E-09 0.973338 3.35E-09 0.975954 2.00E-09Housing & Energy 0.691634 0.000797 0.619239 0.002389 0.815513 5.72E-05 0.571144 0.004467Furnishing & Household 0.961084 2.23E-08 0.939915 1.98E-07 0.967252 9.40E-09 0.928314 4.80E-07Health 0.879431 6.61E-06 0.886291 4.92E-06 0.811045 6.46E-05 0.705599 0.000627Transport 0.980366 7.24E-10 0.89304 3.61E-06 0.956601 3.86E-08 0.885607 5.07E-06Communication 0.853908 1.75E-05 0.877001 7.31E-06 0.969806 6.26E-09 0.95869 3.01E-08Recreation & Culture 0.802544 8.08E-05 0.838579 2.90E-05 0.971097 5.02E-09 0.919253 8.75E-07Education 0.843672 2.46E-05 0.703339 0.000652 0.957429 3.50E-08 0.90922 1.58E-06Restaurants & Hotels 0.916297 1.05E-06 0.721373 0.000472 0.81301 6.13E-05 0.59444 0.003327Goods & Services 0.922857 6.95E-07 0.84841 2.11E-05 0.852325 1.85E-05 0.806945 7.21E-05

64C

hap

terA

Page 81: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

App

endix

Table A.9: R-values and P-values for Cluster 2

GWP(co2-eq)

LandFootprint (m2)

MaterialFootprint (kg)

Blue WaterConsumption (m3)

Division R-Value P-Value R-Value P-Value R-Value P-Value R-Value P-Value

Food & Beverages 0.984823 9.69E-06 0.956279 0.000138 0.995578 4.42E-07 0.86419 0.00243Alcohol & Tobacco 0.955716 0.000142 0.867069 0.0023 0.927705 0.00049 0.78706 0.007723Clothing & Footwear 0.988698 4.63E-06 0.919737 0.000638 0.98173 1.54E-05 0.97583 3.11E-05Housing & Energy 0.922544 0.000583 0.434638 0.064494 0.715822 0.016407 0.436779 0.106041Furnishing & Household 0.822296 0.004841 0.937223 0.000343 0.939575 0.000312 0.93775 0.000336Health 0.967224 6.68E-05 0.827997 0.004451 0.874461 0.001988 0.801694 0.006423Transport 0.666283 0.025108 0.968608 6.00E-05 0.9918 2.07E-06 0.918249 0.000669Communication 0.68414 0.021693 0.849533 0.003158 0.891952 0.001357 0.935882 0.000362Recreation & Culture 0.617605 0.036173 0.655306 0.027373 0.834099 0.004056 0.865999 0.002348Education 0.803809 0.006248 0.908498 0.00089 0.862156 0.002524 0.897697 0.001181Restaurants & Hotels 0.945424 0.000241 0.960979 0.000104 0.82105 0.004928 0.884994 0.00159Goods & Services 0.836345 0.003917 0.980581 1.80E-05 0.970967 4.93E-05 0.98063 1.79E-05

Chap

terA

65

Page 82: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

App

endix

Table A.10: R-values and P-values for Cluster 3

GWP(co2-eq)

LandFootprint (m2)

MaterialFootprint (kg)

Blue WaterConsumption (m3)

Division R-Value P-Value R-Value P-Value R-Value P-Value R-Value P-Value

Food & Beverages 0.94641 0.00023 0.937771 0.000336 0.967989 6.30E-05 0.938522 0.000325Alcohol & Tobacco 0.961792 9.82E-05 0.937242 0.000343 0.936918 0.000347 0.894415 0.001279Clothing & Footwear 0.970911 4.95E-05 0.936649 0.000351 0.97104 4.90E-05 0.932791 0.000408Housing & Energy 0.563444 0.051873 0.435715 0.106619 0.606772 0.039011 0.725477 0.014981Furnishing & Household 0.97015 5.28E-05 0.920663 0.00062 0.944067 0.000256 0.985187 9.12E-06Health 0.928976 0.000469 0.901707 0.001067 0.922522 0.000584 0.956262 0.000138Transport 0.982975 1.29E-05 0.971581 4.67E-05 0.952222 0.000172 0.974943 3.40E-05Communication 0.916319 0.000709 0.986029 7.87E-06 0.993847 1.01E-06 0.968083 6.25E-05Recreation & Culture 0.937677 0.000337 0.893388 0.001311 0.949603 0.000197 0.962588 9.32E-05Education 0.789265 0.007518 0.558653 0.053452 0.764076 0.01008 0.55426 0.054929Restaurants & Hotels 0.905618 0.000962 0.865288 0.00238 0.893357 0.001312 0.971147 4.85E-05Goods & Services 0.898838 0.001148 0.821708 0.004882 0.911268 0.000823 0.880247 0.001763

66C

hap

terA

Page 83: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Bibliography

Barreto, Guilherme A. (2007). “Time Series Prediction with the Self-OrganizingMap: A Review”. In: Perspectives of Neural-Symbolic Integration. Ed. by Bar-bara Hammer and Pascal Hitzler. Studies in Computational Intelligence. Berlin,Heidelberg: Springer Berlin Heidelberg, pp. 135–158. isbn: 978-3-540-73954-8.doi: 10.1007/978-3-540-73954-8_6. url: https://doi.org/10.1007/978-3-540-73954-8_6 (visited on 08/03/2019).

Beylot, Antoine et al. (Apr. 10, 2019). “Assessing the Environmental Impacts of EUConsumption at Macro-Scale”. In: Journal of Cleaner Production 216, pp. 382–393. issn: 0959-6526. doi: 10.1016/j.jclepro.2019.01.134.

Broekhoff, Derik, Peter Erickson, and Georgia Piggot (Feb. 2019). Estimating Consumption-Based Greenhouse Gas Emissions at the City Scale. url: https://www.sei.org / wp - content / uploads / 2019 / 03 / estimating - consumption - based -

greenhouse-gas-emissions.pdf (visited on 06/25/2019).C40 Cities (Mar. 2018). Consumption-Based GHG Emissions of C40 Cities. url:

https://www.c40.org/researches/consumption-based-emissions (visitedon 06/25/2019).

COTEC (July 16, 2008). Communication From the Commission To The EuropeanParliament, The Council, The European Economic And Social Committee AndThe Committee Of The Regions on the Sustainable Consumption and Productionand Sustainable Industrial Policy Action Plan. url: https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:52008DC0397&from=EN (visitedon 01/28/2019).

Dawkins, E. et al. (2018). “The Swedish Footprint: A Multi-Model Comparison”.In: 209. issn: 0959-6526. doi: 10.1016/j.jclepro.2018.11.023.

Department of Economic and Social Affairs (2018a). Classification of IndividualConsumption According to Purpose (COICOP) 2018. url: https://unstats.un.org/unsd/classifications/unsdclassifications/COICOP_2018_-_pre-

edited_white_cover_version_-_2018-12-26.pdf (visited on 02/13/2019).— (2018b). Handbook on Supply, Use, and Input-Output Tables with Extensions

and Applications. New York, p. 610. url: https://unstats.un.org/unsd/nationalaccount/docs/SUT_IOT_HB_wc.pdf (visited on 05/07/2019).

Ehsani, Amir Houshang, Friedrich Quiel, and Arash Malekian (Oct. 1, 2010). “Ef-fect of SRTM Resolution on Morphometric Feature Identification Using NeuralNetwork—Self Organizing Map”. In: GeoInformatica 14.4, pp. 405–424. issn:1573-7624. doi: 10.1007/s10707-009-0085-4.

European Commission (Mar. 25, 2010). EUROPA - Competition - List of NACECodes. url: http://ec.europa.eu/competition/mergers/cases/index/nace_all.html (visited on 07/02/2019).

67

Page 84: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

EUROSTAT (2008). NACE Rev. 2. OCLC: 933248017. Luxembourg: Office for Of-ficial Publications of the European Communities. isbn: 978-92-79-04741-1.

Eurostat (June 1, 2018). “Input-Output Tables”. Powerpoint. ESTP Course on Na-tional Accounts (Luxembourg). url: https://circabc.europa.eu/webdav/CircaBC / ESTAT / ESTP / Library / 2018 % 20ESTP % 20PROGRAMME / 23 . %20ESA %

202010%20-%20National%20Accounts%2c%2028%20May%20-%2001%20June%

202018%20-%20Organiser_%20EUROSTAT/19a%20day_4_item_19a_Symmetrical%

20Input%20Output%20Tables.pdf (visited on 02/15/2019).— (June 20, 2019a). Comparative Price Levels of Consumer Goods and Services -

Statistics Explained. url: https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Comparative_price_levels_of_consumer_

goods_and_services (visited on 07/03/2019).— (June 3, 2019b). Population on 1 January by Age and Sex (Demo pjan). url:

https://ec.europa.eu/eurostat/web/population-demography-migration-

projections/data/database#.— (2019c). Glossary:Purchasing Power Standard (PPS) - Statistics Explained. url:

https://ec.europa.eu/eurostat/statistics- explained/index.php/

Glossary:Purchasing_power_standard_(PPS) (visited on 05/15/2019).Fan, Jessie X. et al. (2007). “Household Food Expenditure Patterns: A Cluster Anal-

ysis”. In: Monthly Labor Review 130, p. 38. url: https://heinonline.org/HOL/Page?handle=hein.journals/month130&id=390&div=&collection=.

Froemelt, Andreas, David J. Durrenmatt, and Stefanie Hellweg (Aug. 7, 2018a).“Using Data Mining To Assess Environmental Impacts of Household Consump-tion Behaviors”. In: Environmental Science & Technology 52.15, pp. 8467–8478.issn: 0013-936X, 1520-5851. doi: 10.1021/acs.est.8b01452.

— (Aug. 7, 2018b). “Using Data Mining To Assess Environmental Impacts of House-hold Consumption Behaviors - Supporting Information”. In: Environmental Sci-ence & Technology 52.15, pp. 8467–8478. issn: 0013-936X, 1520-5851. doi: 10.1021/acs.est.8b01452.

Gajowniczek, Krzysztof and Tomasz Zabkowski (July 2015). “Data Mining Tech-niques for Detecting Household Characteristics Based on Smart Meter Data”.In: Energies 8.7, pp. 7407–7427. doi: 10.3390/en8077407.

Girod, Bastien and Peter de Haan (Dec. 1, 2009). “GHG Reduction Potential ofChanges in Consumption Patterns and Higher Quality Levels: Evidence fromSwiss Household Consumption Survey”. In: Energy Policy 37.12, pp. 5650–5661.issn: 0301-4215. doi: 10.1016/j.enpol.2009.08.026.

Han, Jiawei and Micheline Kamber (2006). Data Mining: Concepts and Techniques.2nd ed. The Morgan Kaufmann Series in Data Management Systems. OCLC:ocm63401845. Amsterdam ; Boston : San Francisco, CA: Elsevier ; Morgan Kauf-mann. 770 pp. isbn: 978-1-55860-901-3.

Hassani, Hossein, Gilbert Saporta, and Emmanuel Sirimal Silva (Mar. 2014). “DataMining and Official Statistics: The Past, the Present and the Future”. In: BigData 2.1, pp. 34–43. issn: 2167-6461, 2167-647X. doi: 10.1089/big.2013.0038.

Haykin, Simon S. (2009). Neural Networks and Learning Machines. 3. ed. OCLC:857737780. New York: Pearson. 906 pp. isbn: 978-0-13-147139-9.

Hennig, Christian et al., eds. (2016). Handbook of Cluster Analysis. Chapman andHall/CRC Handbooks of Modern Statistical Methods. OCLC: 936133883. Boca

Page 85: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Raton London New York: CRC Press, a Chapman and Hall book. 753 pp. isbn:978-1-4665-5189-3 978-1-4665-5188-6.

Hunter, J.D. (May 2007). “Matplotlib: A 2D Graphics Envorinment”. In: Computingin Science Engineering 9.3, pp. 90–95. doi: 10.1109/MCSE.2007.55.

Ivanova, Diana, Konstantin Stadler, et al. (June 2016). “Environmental ImpactAssessment of Household Consumption: Environmental Impact Assessment ofHousehold Consumption”. In: Journal of Industrial Ecology 20.3, pp. 526–536.issn: 10881980. doi: 10.1111/jiec.12371.

Ivanova, Diana, Gibran Vita, et al. (May 2017). “Mapping the Carbon Footprint ofEU Regions”. In: Environmental Research Letters 12.5, p. 054013. issn: 1748-9326. doi: 10.1088/1748-9326/aa6da9.

Kind, M. Carrasco and R. J. Brunner (Mar. 11, 2014). “SOMz: Photometric RedshiftPDFs with Self Organizing Maps and Random Atlas”. In: Monthly Notices of theRoyal Astronomical Society 438.4, pp. 3409–3421. issn: 1365-2966, 0035-8711.doi: 10.1093/mnras/stt2456. arXiv: 1312.5753.

Kohonen, T. (June 1997). “Exploration of Very Large Databases by Self-OrganizingMaps”. In: Proceedings of International Conference on Neural Networks (ICNN’97).Proceedings of International Conference on Neural Networks (ICNN’97). Vol. 1,PL1–PL6 vol.1. doi: 10.1109/ICNN.1997.611622.

Kohonen, Teuvo (Sept. 1990). “The Self-Organizing Map”. In: Proceedings of theIEEE 78.9, pp. 1464–1480. issn: 0018-9219. doi: 10.1109/5.58325.

— (2001). Self-Organizing Maps. 3rd ed. Springer Series in Information Sciences 30.Berlin ; New York: Springer. 501 pp. isbn: 978-3-540-67921-9.

Kuhn, Max and Kjell Johnson (2013). Applied Predictive Modeling. New York, NY:Springer New York. isbn: 978-1-4614-6848-6 978-1-4614-6849-3. doi: 10.1007/978-1-4614-6849-3. url: http://link.springer.com/10.1007/978-1-4614-6849-3 (visited on 02/18/2019).

Lenzen, Manfred et al. (Mar. 2013). “BUILDING EORA: A GLOBAL MULTI-REGION INPUT–OUTPUT DATABASE AT HIGH COUNTRY AND SEC-TOR RESOLUTION”. In: Economic Systems Research 25.1, pp. 20–49. issn:0953-5314, 1469-5758. doi: 10.1080/09535314.2013.769938.

Leontief, Wassily (Aug. 1936). “Quantitative Input and Output Relations in theEconomic Systems of the United States”. In: The Review of Economics andStatistics 18.3, p. 105. issn: 00346535. doi: 10.2307/1927837.

— (1974). “Sructure of the World Economy: Outline of a Simple Input-OutputFormulation”. In: The American Economic Review 64.6, pp. 823–834. issn: 0002-8282. JSTOR: 1815236.

— (1986). Input-Output Economics. 2nd ed. New York: Oxford University Press.436 pp. isbn: 978-0-19-503525-4 978-0-19-503527-8.

Liu, Lingbo et al. (July 31, 2018). “Fast Identification of Urban Sprawl Based onK-Means Clustering with Population Density and Local Spatial Entropy”. In:Sustainability 10.8, p. 2683. issn: 2071-1050. doi: 10.3390/su10082683.

McKinney, Wes (2010). “Data Structures for Statistical Computing in Python”. In:Proceedings of the 9th Python in Science Conference, pp. 51–56. url: http://conference.scipy.org/proceedings/scipy2010/mckinney.html (visitedon 07/23/2019).

Michael Waskom et al. (July 16, 2018). Mwaskom/Seaborn: V0.9.0 (July 2018). url:https://zenodo.org/record/1313201 (visited on 07/23/2019).

Page 86: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

Miller, Ronald E and Peter D Blair (2009). Input-Output Analysis: Foundations andExtensions. OCLC: 495994136. Cambridge; New York: Cambridge UniversityPress. isbn: 978-0-511-59492-2 978-0-511-59020-7 978-0-511-65103-8 978-0-511-62698-2. url: http://dx.doi.org/10.1017/CBO9780511626982 (visited on02/14/2019).

Ministry of Finance (Oct. 5, 2016). Strategy for Sustainable Consumption. url:https://www.government.se/4a9932/globalassets/government/dokument/

finansdepartementet/pdf/publikationer-infomtrl-rapporter/en-strategy-

for- sustainable- consumption-- tillganglighetsanpassadx.pdf (visitedon 02/07/2019).

Moosavi, Vahid and Sebastian Packman (Apr. 3, 2019). SOMPY. url: https://github.com/sevamoo/SOMPY.

Moran, Daniel and Richard Wood (July 3, 2014). “Convergence Between the Eora,Wiod, Exiobase, and Openeu’s Consumption-Based Carbon Accounts”. In: Eco-nomic Systems Research 26.3, pp. 245–261. issn: 0953-5314. doi: 10.1080/

09535314.2014.935298.OECD and Statistical Office of the European Communities (May 24, 2007). Eurostat-

OECD Methodological Manual on Purchasing Power Parities. OECD. isbn: 978-92-64-01132-8 978-92-64-01133-5. doi: 10.1787/9789264011335-en. url: https://www.oecd-ilibrary.org/economics/eurostat-oecd-methodological-

manual - on - purchasing - power - parities _ 9789264011335 - en (visited on06/07/2019).

Peters, Glen P. and Edgar G. Hertwich (Jan. 1, 2008). “Post-Kyoto Greenhouse GasInventories: Production versus Consumption”. In: Climatic Change 86.1, pp. 51–66. issn: 1573-1480. doi: 10.1007/s10584-007-9280-1.

SCB (July 11, 2019). CPI, Fixed Index Numbers (1980=100). url: http://www.scb.se/en/finding-statistics/statistics-by-subject-area/prices-

and- consumption/consumer- price- index/consumer- price- index- cpi/

pong/tables-and-graphs/consumer-price-index-cpi/cpi-fixed-index-

numbers-1980100/ (visited on 07/12/2019).Schaffartzik, Anke et al. (2014). “Environmentally Extended Input-Output Analy-

sis”. In: Social Ecology Working Paper 154.Seabold, Skipper and Josef Perktold (2018). “Statsmodels: Econometric and Sta-

tistical Modelingwith Python”. In: PROC. OF THE 9th PYTHON IN SCI-ENCE CONF. (SCIPY 2010), pp. 57–61. url: https://www.statsmodels.org/stable/index.html (visited on 08/07/2019).

Sommer, Mark and Kurt Kratena (June 2017). “The Carbon Footprint of EuropeanHouseholds and Income Distribution”. In: Ecological Economics 136, pp. 62–72.issn: 09218009. doi: 10.1016/j.ecolecon.2016.12.008.

Stadler, Konstantin (Aug. 29, 2014). “Pymrio—Multi Regional Input Output Anal-ysis in Python”. In: doi: 10.6084/m9.figshare.1209339.

Stadler, Konstantin et al. (2018). “EXIOBASE 3: Developing a Time Series ofDetailed Environmentally Extended Multi-Regional Input-Output Tables”. In:Journal of Industrial Ecology 22.3, pp. 502–515. issn: 1530-9290. doi: 10.1111/jiec.12715.

Steen-Olsen, Kjartan, Richard Wood, and Edgar G. Hertwich (June 1, 2016). “TheCarbon Footprint of Norwegian Household Consumption 1999–2012”. In: Journal

Page 87: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

of Industrial Ecology 20.3, pp. 582–592. issn: 1088-1980. doi: 10.1111/jiec.12405.

Steinmann, Zoran J. N. et al. (2018). “Headline Environmental Indicators Revis-ited with the Global Multi-Regional Input-Output Database EXIOBASE”. In:Journal of Industrial Ecology 22.3, pp. 565–573. issn: 1530-9290. doi: 10.1111/jiec.12694.

Tan, Hiong Sen and Susan E. George (2005). “Investigating Learning Parametersin a Standard 2-D SOM Model to Select Good Maps and Avoid Poor Ones”.In: AI 2004: Advances in Artificial Intelligence. Ed. by Geoffrey I. Webb andXinghuo Yu. Lecture Notes in Computer Science. Springer Berlin Heidelberg,pp. 425–437. isbn: 978-3-540-30549-1.

Timmer, Marcel P. et al. (2015). “An Illustrated User Guide to the World In-put–Output Database: The Case of Global Automotive Production”. In: Reviewof International Economics 23.3, pp. 575–605. issn: 1467-9396. doi: 10.1111/roie.12178.

Titterington, Michael (Jan. 2010). “Neural Networks: Neural Networks”. In: WileyInterdisciplinary Reviews: Computational Statistics 2.1, pp. 1–8. issn: 19395108.doi: 10.1002/wics.50.

Tukker, Arnold et al. (2014). “The Global Resource Footprint of Nations”. In: p. 39.United Nations (2018). Sustainable Consumption and Production. url: https://

www.un.org/sustainabledevelopment/sustainable-consumption-production/

(visited on 01/28/2019).US Department of Health and Human Services (Apr. 2019). Household Products

Database - Health and Safety Information on Household Products. url: https://hpd.nlm.nih.gov/index.htm (visited on 07/02/2019).

Van Rossum, Guido and Fred L Drake (2011). The Python Language ReferenceManual: For Python Version 3.2. OCLC: 800830321. Bristol: Network TheoryLtd. isbn: 978-1-906966-14-0.

Vesanto, Juha (2000). SOM Toolbox for Matlab 5. OCLC: 58272790. Espoo: HelsinkiUniversity of Technology. isbn: 978-951-22-4951-0. url: https://pdfs.semanticscholar.org/9b4f/6595ab9b851d851a440fe480f3b3bf7ad092.pdf?_ga=2.167058041.

631764751.1552473829-521711893.1550669607.Walt, S. van der, S. C. Colbert, and G. Varoquaux (Mar. 2011). “The NumPy Array:

A Structure for Efficient Numerical Computation”. In: Computing in ScienceEngineering 13.2, pp. 22–30. issn: 1521-9615. doi: 10.1109/MCSE.2011.37.

Page 88: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN
Page 89: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN
Page 90: Exploring Consumer Expenditure And Environmental Impacts …kth.diva-portal.org/smash/get/diva2:1373931/FULLTEXT01.pdf · 2019. 11. 28. · SECOND CYCLE, 30 CREDITS STOCKHOLM , SWEDEN

TRITA -ITM-EX 2019:556

www.kth.se