Top Banner
Dr. Kimmo Soramäki Founder and CEO FNA, www.fna.fi Center for Financial Studies at the Goethe University PhD Mini-course Frankfurt, 25 January 2013 Financial Networks V. Inferring Links
36

Financial Networks V - Inferring Links

Jan 27, 2015

Download

Documents

Kimmo Soramaki

Fifth lecture of a PhD level course on "Financial Networks" at Center for Financial Research at Goethe University, Frankfurt.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Financial Networks V - Inferring Links

Dr. Kimmo SoramäkiFounder and CEOFNA, www.fna.fi

Center for Financial Studies at the Goethe UniversityPhD Mini-course Frankfurt, 25 January 2013

Financial Networks

V. Inferring Links

Page 2: Financial Networks V - Inferring Links

2

II. Mapping Financial Markets

I. Mapping Systemic Risk

Page 3: Financial Networks V - Inferring Links

3

Observing vs inferring

• Observing links – Exposures, payment flow, trade, co-ownership,

joint board membership, etc.– Cause of link is known

• Inferring links – Observing the effects and inferring a

relationship– Cause of link is unknown– Time series on asset prices, trade volumes,

balance sheet items

Page 4: Financial Networks V - Inferring Links

4

Agenda

V. Inferring Links• Prices and Returns• Controlling for common factors• Correlation and dependence• Significant correlations• Multiple Comparisons

VI. Correlation Networks• Distance and Hierarchical Clustering• Minimum Spanning Tree & PMFG• Filtering• Visual Layouts

Page 5: Financial Networks V - Inferring Links

5

Prices and Returns

Arithmetic return

Logarithmic return

at time i where pi is initial price, and future price

Page 6: Financial Networks V - Inferring Links

6

Prices vs Returns

• Benefit of using returns vs prices for correlations:

• Normalization. All variables are denoted in same units• Required by some matrix algebra• Returns (on asset prices) have low serial correlation• Returns are what matter once investement is made

• We'll use returns

Page 7: Financial Networks V - Inferring Links

7

Log vs Arithmetics Returns

• Most often logarithmic returns are used in Finance. Why?

• Benefit of using log returns vs arithmetic returns– Assumptions in statistical estimates (log-normal

returns)– Easier calculations (integration, compounding) lead to

smaller algorithmic complexity

• Normaliy assumptions make log returns better

• We'll use log returns

Page 8: Financial Networks V - Inferring Links

8

Data Issues

• Which prices– daily (low, high, close)– time-zone issues– high-frequency

• Missing prices– Errors in source data– Partial holidays– Weekends/holidays– Exit and entry of assets

• Handling options– Look up from other sources– Edit values, e.g. replace with same as previous (prices) or 0 (returns)– Exclude days/series from analysis– Decide on weekend returns depending on data

Page 9: Financial Networks V - Inferring Links

9

DAX stocksADS Adidas clothing and footwear 2.0ALV Allianz insurance 6.7BAS BASF speciality chemicals 9.6BAYN Bayer speciality chemicals 7.6BEI Beiersdorf personal products 0.9BMW BMW automobile manufacturers 3.3CBK Commerzbank credit banks 1.0CON Continental car parts manufacturers 0.8DAI Daimler automobile manufacturers 5.8DB1 Deutsche Börse securities brokers 1.5DBK Deutsche Bank credit banks 5.1DPW Deutsche Post logistics 1.9DTE Deutsche Telekom fixed-line telecommunication 5.4EOAN E.ON multi-utilities 6.2FME Fresenius Medical Care health care 2.2FRE Fresenius health care 1.6HEI HeidelbergCement building materials 0.9HEN3 Henkel personal products 1.5IFX Infineon Technologies semiconductors 1.3LHA Deutsche Lufthansa airlines 0.8LIN Linde industrial gases 3.8LXS Lanxess specialty chemicals 0.7MRK Merck pharmaceuticals 1.0MUV2 Munich Re re-insurance 2.9RWE RWE multi-utilities 2.2SAP SAP software 7.7SDF K+S commodity chemicals 1.2SIE Siemens diversified industrials 10.0TKA ThyssenKrupp diversified industrials 1.3VOW3 Volkswagen Group automobile manufacturers 3.4

Ticker Name Sector Weight

Page 10: Financial Networks V - Inferring Links

10

Data: Prices of 30 stocks in DAX in 2011

date ADS ALV BAYN BAS2011-12-30 50.26 73.91 49.4 53.892011-12-29 49.94 73.21 48.73 53.172011-12-28 49.79 73.15 47.36 52.392011-12-27 50.3 75.45 48.27 53.172011-12-23 50.29 75.97 48.13 52.832011-12-22 49.98 75.81 47.92 52.632011-12-21 49.43 75.37 47.04 52.42011-12-20 49.56 75.43 46.99 52.92011-12-19 47.96 72.23 44.75 51.152011-12-16 48.1 72.77 45.04 51.712011-12-15 47.89 73.47 44.99 51.132011-12-14 48 71.53 44.8 50.582011-12-13 48.45 73.04 45.73 51.1

...

...

Page 11: Financial Networks V - Inferring Links

11

calculatereturns -command• File (-file) : File with price data. The file must contain a table

where each row has the observation date as its first element, and asset prices as subsequent elements separated by a field delimiter. The first line read must contain names for each asset column (except the first column which contains the date). Mandatory.

• Save As (-saveas) : File where result should be saved. Mandatory.• Date format (-dateformat) : Format of date in input file. Default

value yyyy-MM-dd. Optional.• Method of calculation (-method) : Arithmetic (arithmetic) or

logarithmic (log) returns. Mandatory. By default 'log'.• Number of observations (-obs) : Larger than or equal to 2.

Optional. By default '2'.• Length of interval (-interval) : Larger than or equal to 1. Optional.

By default '1'.• Date order (-dateorder) : Date order in the file. Optional. Allowed

values: [asc, desc]. By default 'asc'.

Page 12: Financial Networks V - Inferring Links

12

Examples

# Calculate daily log returnscalculatereturns -file daxprices-2011.csv -saveas daxreturns-2011.csv -method log -dateorder desc

# Calculate two-day log returnscalculatereturns -file daxprices-2011.csv -saveas daxreturnsl-2dl2.csv -method log -obs 2 -interval 2 -dateorder desc

# Calculate two-day log returns with a one-day windowcalculatereturns -file daxprices-2011.csv -saveas daxreturnsl-2dl1.csv -method log -obs 2 -interval 1 -dateorder desc

# Calculate daily arithmetic returnscalculatereturns -file daxprices-2011.csv -saveas daxreturns-2da1.acsv -method arithmetic -dateorder desc

Page 13: Financial Networks V - Inferring Links

13

Data: Returns of 30 stocks in DAX in 2011

date ADS.DE ALV.DE BAYN.DE BAS.DE2011-12-29 0.0064 0.0095 0.0137 0.01352011-12-28 0.0030 0.0008 0.0285 0.01482011-12-27 -0.0102 -0.0310 -0.0190 -0.01482011-12-23 0.0002 -0.0069 0.0029 0.00642011-12-22 0.0062 0.0021 0.0044 0.00382011-12-21 0.0111 0.0058 0.0185 0.00442011-12-20 -0.0026 -0.0008 0.0011 -0.00952011-12-19 0.0328 0.0433 0.0488 0.03362011-12-16 -0.0029 -0.0074 -0.0065 -0.01092011-12-15 0.0044 -0.0096 0.0011 0.01132011-12-14 -0.0023 0.0268 0.0042 0.01082011-12-13 -0.0093 -0.0209 -0.0205 -0.01022011-12-12 0.0058 -0.0163 -0.0020 -0.0099

...

...

Page 14: Financial Networks V - Inferring Links

14

Common Factors

• A common factor may be affecting all the returns– E.g bullish/bearish market

• If we know the market return, we can deduct it from the returns to look at excess returns to market

• If we don't know, we can try to control for this unknown common factor via statistical methods

• Principal Component Analysis allows us to identify and remove the common factor

Page 15: Financial Networks V - Inferring Links

15

Eigenvectors - exampleA plot of the normalised data (mean subtracted)-> First component explains maximum variance

The data has two variables (x and y), the covariance matrix is 2x2 and thus there are two Eigenvectors

The eigenvectors ofthe covariance matrix are overlayed as dotted lines

Page 16: Financial Networks V - Inferring Links

16

Eigenvectors of asset correlations

• The eigenvectors of a correlation matrix are orthogonal to each other. The data can be projected in terms of its eigenvectors

• Each eigenvector represents a Priciple Component– if values are similar, the component affects all returns similarly

• Eigenvalues scale with the share of variance explaned by each component– Divide each by the sum to get share

• Laloux et al (1999) find on S&P500 (and other markets)– The largest eigenvalue is well separated from the bulk and corresponds to the

whole market as the corresponding eigenvector has roughly equal components– The next largest eigenvectors carry information about the real correlations and

can be used in identifying clusters of strongly interacting assets– The smaller eigenvectors are noise (95%)

L. Laloux, P. Cizeau, J.-P. Bouchaud, M. Potters, Noise Dressing of Financial Correlation Matrices. Phys. Rev. Lett. 83 (1999) 1467 ; T. Heimo, J. Saramaki, J-P. Onnela, K. Kaski (2007). Spectral and network methods in the analysis of correlation matrices of stock returns, Physica A 383, pp. 147–151

Page 17: Financial Networks V - Inferring Links

17

pca -command

• File (-file) : File with input data. • Components (-components) : Components to keep, e.g. 2- or 1 or

2-10 Optional.• Observations (-obs) : Number of observations in each period.• Interval (-interval) : Number of observations between each

period.• Saveas (-saveas) : Name of file to store amended data.

Examples

# calculate returns corresponding to all but principal componentpca -file dax-returns-2011.csv -obs 30 -interval 30 -components 2- -saveas dax-montly1.csv

# calculate returns corresponding to second to 10th largest componentpca -file dax-returns-2011.csv -obs 30 -interval 30 -components 2-10 -saveas dax-montly2.csv

Page 18: Financial Networks V - Inferring Links

18

Correlation and Covariance

• Standard deviation

• Covariance and correlation are statistical measure of how

two variables move in relation to each other

• Covariance is in original units

• Correlation ranges in -1 and 1

Page 19: Financial Networks V - Inferring Links

19

Page 20: Financial Networks V - Inferring Links

20

buildbycorrelation -command

• File (-file) : File from which to read data. To skip lines or use non-default field delimiter use the 'delimiter' and 'skiplines' arguments. Mandatory.

• Save as property (-saveas) : Property name of saved result. Optional. By default 'correlation'.

• Number of observations (-obs) : Larger than or equal to 2. Optional.• Length of interval (-interval) : Larger than or equal to 1. Optional.

• -commitinterval : Larger than or equal to 1. Optional. By default '10000'.

• Save standard deviation (-savestdev) : Option to save standard deviation as vertex property. Optional. By default 'false'.

• Save average returns (-savereturns) : Option to save average returns as vertex property. Optional. By default 'true'.

• Preserve (-preserve) : Do not overwrite existing networks in database. Optional. By default 'true'.

Page 21: Financial Networks V - Inferring Links

21

• Missing values (-missing) : Handling of missing or non-numeric values in data. Alternatives:– Zero : missing values are considered as 0.– NaN : default value corresponds to 'NaN and missing values' setting in 'Default

format settings'.– Alert : missing values cause the calculation to fail.

Optional. Allowed values: [Zero, NaN, Alert]. By default 'Zero'.

• Start date (-start) : Starting date of calculation. Optional.• End date (-end) : Ending date of calculation. Optional.• Date format (-dateformat) : Format of date in input file. Default value

corresponds to 'Date format' setting in 'Default format settings'. Optional.• Date order (-dateorder) : Order of dates in the input file. If dates are

ascending order, the newest network has fewer observations than other networks. If dates are descending order, the oldest network has fewer observations. Optional. Allowed values: [asc, desc]. By default 'desc'.

Page 22: Financial Networks V - Inferring Links

22

Correlation Matrix & Network

• Correlation matrix of n variables is a n×n matrix whose i,j entry is correlation of and

• Symmetric because, )= , )• Diagonal is 1

• Represented as a complete undirected network

A B CA 0 0.72 0.31B 0.72 0 0.60C 0.31 0.60 0

Page 23: Financial Networks V - Inferring Links

23

Examples

# Calculate 100 day pairwise correlations with 1 day sliding window, consider any missing/non-numeric return as 0buildbycorrelationd -file daxreturns-2011.csv -missing Zero -days 100 -interval 1 -preserve false

# Calculate 100 day pairwise correlations with no sliding window, consider any missing/non-numeric return as 0buildbycorrelationd -file daxreturns-2011.csv -missing Zero -days 100 -interval 100 -preserve false

# Calculate pairwise correlations for whole data, alert about missing valuesbuildbycorrelation -file daxreturns-2011.csv -missing Alert -preserve false

Page 24: Financial Networks V - Inferring Links

24

Heatmap -command

• Creates time series of heatmaps• Like viz -command• Allows mapping of data to cell size, shape, color, label,

hover • Allows definition of color ranges and domains

# create heatmap from correlationsheatmap -sortv vertex_id -p correlation -symmetric true -cellsizedefault 13 -transition 0 -cellhover correlation -colordomain (-1)-1 -palette darkblue-lightgray-darkred -saveas daxheat-def-Y

Page 25: Financial Networks V - Inferring Links

25

All components Principal Component Removed

28&29

Page 26: Financial Networks V - Inferring Links

http://upload.wikimedia.org/wikipedia/commons/9/93/Optical_illusion_greysquares.gif

About Color Perception

A and B are the same shade of gray

Page 27: Financial Networks V - Inferring Links

27

Anscombes Quartet

Mean of x 9 Variance of x 11

Mean of y ~7.50 Variance of y ~4.1

Correlation ~0.816

Linear regression: y = 3.00 + 0.500x

Constructed in 1973 by Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers on statistical properties

Chatterjee, Sangit; Firat, Aykut (2007). "Generating Data with Identical Statistics but Dissimilar Graphics: A Follow up to the Anscombe Dataset". American Statistician 61 (3): 248–254.

Page 28: Financial Networks V - Inferring Links

28

Significance of Correlations

• We are estimating 'true' correlations by means of samples

• Our estimates have errors

• Question: What is the chance that random data exhibits an observed correlation?

Page 29: Financial Networks V - Inferring Links

29

Example - distribution of correlation in 30 trials with random numbers

20 pairs 50 pairs

100 pair 200 pairs

Page 30: Financial Networks V - Inferring Links

30

Test for significance

• Errors:– Type I: False positive - we identify a correlation where

there is none– Type II: False negative - we identify no correlation when

there is one

• We can control for Type I– Fisher transformation -> Normal distr. -> Confidence

Interval

• Possible to estimate Type II error rate - only more data will bring this down

Page 31: Financial Networks V - Inferring Links

31

alpha -parameter

Significance level (-alpha) : Two-sided significance level for confidence interval.

Example:buildbycorrelation -file daxreturns-2011.csv -missing Alert -obs 30 -interval 1 -alpha 0.05 -preserve false

30-32

Page 32: Financial Networks V - Inferring Links

32

No signifance test, Principal Component Removed

Only significant correlations (a=0.05), Principal Component Removed

Page 33: Financial Networks V - Inferring Links

33

Problem of Multiple Comparisons

• Occurs when considering multiple simultanous estimates – e.g. with correlation matrix of n=30 we have 30*30/2-30 =

420 estimates– > too many false positives by random chance alone– the system may not be accurate enough

• Familywise error rate (FWER) - Probability of making one or more false positives among all the hypotheses when performing multiple hypotheses tests.– Solved with e.g. Bonferroni correction

• False discovery rate (FDR) - Proportion of significant correlations that are false positives– Benjamini-Hochberg -procedure

• Question: which is more important? Missing correlations that exist, or disregarding ones that do?

Page 34: Financial Networks V - Inferring Links

34

bonferroni -parameter

Enable Bonferroni correction (-bonferroni) : Option to use Bonferroni correction in significance testing. Optional. By default 'false'.

Examplebuildbycorrelation -file daxreturns.csv -missing Alert -obs 30 -interval 1 -alpha 0.05 -bonferroni true -preserve false

33-35

Page 35: Financial Networks V - Inferring Links

35

Only significant correlations (a=0.05), Principal Component Removed

Bonferroni correction, Only significant correlations (a=0.05), Principal Component Removed

Page 36: Financial Networks V - Inferring Links

Blog, Library and Demos at www.fna.fi

Dr. Kimmo Soramäki [email protected]: soramaki