Top Banner
Applied Statistics Multivariate analysis III “Statistics is merely a quantisation of common sense” Troels C. Petersen (NBI) 1
34

Multivariate analysis IIIpetersen/Teaching/Stat2020/Week6/AS...Different ML answers The many algorithms each produce an estimate, which naturally very correlated with that of the other

Jan 27, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Applied Statistics Multivariate analysis III

    “Statistics is merely a quantisation of common sense”

    Troels C. Petersen (NBI)

    1

  • 2

    Combining ML answers

  • Different ML answersThe many algorithms each produce an estimate, which naturally very correlated with that of the other algorithms. But they are not identical…

    3

    These very high correlationsare of course to be expected.But they are mostly drivenby values far from the actualprice. If these are excluded,we get…

  • Different ML answers

    4

    The many algorithms each produce an estimate, which are naturally very correlated, with the estimates of the other algorithms. But they are not identical…

  • Different ML answersThe many algorithms each produce an estimate, which are naturally very correlated, with the estimates of the other algorithms. But they are not identical…

    5

  • Different ML answersThe slight differences means, that one can combine them and possibly gain a little bit more, not unlike the averaging of trees in a forest.

    Below are two such examples, where on the left (Decision Tree & AdaBoost) the goodness measure is “r2” (R squared - coefficient of determination), while on the right (XGBoost & Random Forest) it is the “Error CI” (width of a Confidence Interval).

    6

  • Overtraining in our caseThe random forest on the housing prices show the following pattern as a function of number of trees. As can be seen, somewhere about 50-80 trees is probably optimal.

    7

  • Ensemble methodsCombining several methods providing “an answer”, the ensemble of methods can become at least as performant, as the best of the methods, and typically even more performant.

    To some extend, combining many different trees is in itself an ensemble method, but this can be done between many different ML (and other) methods. The combination may be simple voting or (rather) a new ML.

    This is also a way of combining different types of ML algorithms and loss functions!

    8

  • 9

    Generative Adversarial Networks

  • Generative Adversarial NetworksInvented (partly) by Ian Goodfellow in 2014, Generative Adversarial Networks (GANs) is a method for learning how to produce new (simulated) datasets from existing data.

    The basic idea is, that two networks “compete” against each other:• Generative Network: Produces new data trying to make it match the original.• Adversarial (Discriminatory) Network: Tries to classify original and new data.

    Typically, the generator is a deconvolutional NN, while the discriminating (adversarial) is convolutional NN.

    “The coolest idea in machine learning in the last twenty years”[Yann LeCun, French computer scientist]

    10

  • GAN drawingImagine that you want to write numbers that looks like hand writing.

    Given a large training set, you can ask you GAN to produce numbers. At first it will do poorly, but as it is “punished” by the discriminator, it improves, and at the end it might be able to produce numbers of equal quality to real data:

    11

  • GAN drawingImagine that you want to write numbers that looks like hand writing.

    Given a large training set, you can ask you GAN to produce numbers. At first it will do poorly, but as it is “punished” by the discriminator, it improves, and at the end it might be able to produce numbers of equal quality to real data:

    12

    Loss = Loss + � · LAdversarialAAACL3icbVDLSgMxFM34rPVVdekmWARBkBkRdCNUBXHhQsHaQqeUO5lUg5lkSO6IZZg/cuOvdCOiiFv/wvSx8HUgcHLOvTe5J0qlsOj7L97E5NT0zGxprjy/sLi0XFlZvbY6M4zXmZbaNCOwXArF6yhQ8mZqOCSR5I3o7mTgN+65sUKrK+ylvJ3AjRJdwQCd1KmchkmkH/JzbW1BD+n32zYNpRsUAw1ZrJGed/KRHaJQPXoUD+aCESCLolOp+jv+EPQvCcakSsa46FT6YaxZlnCFTIK1rcBPsZ2DQcEkL8phZnkK7A5ueMtRBQm37Xy4b0E3nRLTrjbuKKRD9XtHDom1vSRylQngrf3tDcT/vFaG3YN2LlSaIVds9FA3kxQ1HYRHY2E4Q9lzBJgR7q+U3YIBhi6Ksgsh+L3yX3K9uxM4frlXrR2P4yiRdbJBtkhA9kmNnJELUieMPJI+eSVv3pP37L17H6PSCW/cs0Z+wPv8Ao8xqgM=AAACL3icbVDLSgMxFM34rPVVdekmWARBkBkRdCNUBXHhQsHaQqeUO5lUg5lkSO6IZZg/cuOvdCOiiFv/wvSx8HUgcHLOvTe5J0qlsOj7L97E5NT0zGxprjy/sLi0XFlZvbY6M4zXmZbaNCOwXArF6yhQ8mZqOCSR5I3o7mTgN+65sUKrK+ylvJ3AjRJdwQCd1KmchkmkH/JzbW1BD+n32zYNpRsUAw1ZrJGed/KRHaJQPXoUD+aCESCLolOp+jv+EPQvCcakSsa46FT6YaxZlnCFTIK1rcBPsZ2DQcEkL8phZnkK7A5ueMtRBQm37Xy4b0E3nRLTrjbuKKRD9XtHDom1vSRylQngrf3tDcT/vFaG3YN2LlSaIVds9FA3kxQ1HYRHY2E4Q9lzBJgR7q+U3YIBhi6Ksgsh+L3yX3K9uxM4frlXrR2P4yiRdbJBtkhA9kmNnJELUieMPJI+eSVv3pP37L17H6PSCW/cs0Z+wPv8Ao8xqgM=AAACL3icbVDLSgMxFM34rPVVdekmWARBkBkRdCNUBXHhQsHaQqeUO5lUg5lkSO6IZZg/cuOvdCOiiFv/wvSx8HUgcHLOvTe5J0qlsOj7L97E5NT0zGxprjy/sLi0XFlZvbY6M4zXmZbaNCOwXArF6yhQ8mZqOCSR5I3o7mTgN+65sUKrK+ylvJ3AjRJdwQCd1KmchkmkH/JzbW1BD+n32zYNpRsUAw1ZrJGed/KRHaJQPXoUD+aCESCLolOp+jv+EPQvCcakSsa46FT6YaxZlnCFTIK1rcBPsZ2DQcEkL8phZnkK7A5ueMtRBQm37Xy4b0E3nRLTrjbuKKRD9XtHDom1vSRylQngrf3tDcT/vFaG3YN2LlSaIVds9FA3kxQ1HYRHY2E4Q9lzBJgR7q+U3YIBhi6Ksgsh+L3yX3K9uxM4frlXrR2P4yiRdbJBtkhA9kmNnJELUieMPJI+eSVv3pP37L17H6PSCW/cs0Z+wPv8Ao8xqgM=AAAB5HicbZBNS8NAEIYnftb4Vb16WSyCp5J40aPgxWMF+wFtKZvNpF262YTdiVBC/4AHLyJe/U3e/DduP0BtfWHh4Z0ZduaNciUtBcGXt7G5tb2zW9nz9w/8w6Pj6knLZoUR2BSZykwn4haV1NgkSQo7uUGeRgrb0fhuVm8/obEy0480ybGf8qGWiRScnNUYVGtBPZiLrUO4hBosNah+9uJMFClqEopb2w2DnPolNySFwqnfKyzmXIz5ELsONU/R9sv5mlN24ZyYJZlxTxObu78nSp5aO0kj15lyGtnV2sz8r9YtKLnpl1LnBaEWi4+SQjHK2OxmFkuDgtTEARdGul2ZGHHDBblkfJdBuHrxOrSu6qHjh58woAJncA6XEMI13MI9NKAJAmJ4hldv5L14b977onHDW06cwh95H98YPIsCAAACL3icbVBdSxtBFL2rVm1sNa2PvgwNgiDIbl/0pWApSB98UDAmkA3h7uzEDJmdWWbuloZl/1Ff+lfyIqJIX/svnHxA08QDA2fOuffO3JPkSjoKw4dgbX3jzebW9tvazrv3u3v1Dx9vnSksF01ulLHtBJ1QUosmSVKinVuBWaJEKxl+m/itH8I6afQNjXLRzfBOy77kSF7q1S/iLDE/y0vjXMW+sMXbMYuVH5Qii3lqiF32ypkdk9Qj9jWdzEUrUVVVr94IT8Ip2CqJ5qQBc1z16uM4NbzIhCau0LlOFObULdGS5EpUtbhwIkc+xDvR8VRjJly3nO5bsUOvpKxvrD+a2FRd7Cgxc26UJb4yQxq4ZW8ivuZ1CuqfdUup84KE5rOH+oViZNgkPJZKKzipkSfIrfR/ZXyAFjn5KGo+hGh55VVy+/kk8vz6XxqwDQfwCY4gglM4h+9wBU3g8AvG8AhPwe/gPngO/sxK14J5zz78h+DvC2aXqX0=AAACL3icbVBdSxtBFL2rVm1sNa2PvgwNgiDIbl/0pWApSB98UDAmkA3h7uzEDJmdWWbuloZl/1Ff+lfyIqJIX/svnHxA08QDA2fOuffO3JPkSjoKw4dgbX3jzebW9tvazrv3u3v1Dx9vnSksF01ulLHtBJ1QUosmSVKinVuBWaJEKxl+m/itH8I6afQNjXLRzfBOy77kSF7q1S/iLDE/y0vjXMW+sMXbMYuVH5Qii3lqiF32ypkdk9Qj9jWdzEUrUVVVr94IT8Ip2CqJ5qQBc1z16uM4NbzIhCau0LlOFObULdGS5EpUtbhwIkc+xDvR8VRjJly3nO5bsUOvpKxvrD+a2FRd7Cgxc26UJb4yQxq4ZW8ivuZ1CuqfdUup84KE5rOH+oViZNgkPJZKKzipkSfIrfR/ZXyAFjn5KGo+hGh55VVy+/kk8vz6XxqwDQfwCY4gglM4h+9wBU3g8AvG8AhPwe/gPngO/sxK14J5zz78h+DvC2aXqX0=AAACL3icbVDLSgMxFM3Ud31VXboJFkEQyowb3Qg+QFx0oWBV6JRyJ5NqaCYZkjtiGeaP3Pgr3Ygo4ta/MH0s6uNA4OSce29yT5RKYdH3X73S1PTM7Nz8QnlxaXlltbK2fm11ZhhvMC21uY3AcikUb6BAyW9TwyGJJL+JuqcD/+aBGyu0usJeylsJ3CnREQzQSe3KWZhE+jGva2sLekgnb7s0lG5QDDRksUZab+cjO0ShevQ4HswFI0AWRbtS9Wv+EPQvCcakSsa4aFf6YaxZlnCFTIK1zcBPsZWDQcEkL8phZnkKrAt3vOmogoTbVj7ct6DbTolpRxt3FNKhOtmRQ2JtL4lcZQJ4b397A/E/r5lh56CVC5VmyBUbPdTJJEVNB+HRWBjOUPYcAWaE+ytl92CAoYui7EIIfq/8l1zv1QLHL/3q0ck4jnmySbbIDgnIPjki5+SCNAgjT6RP3si79+y9eB/e56i05I17NsgPeF/fjfGp/w==AAACL3icbVDLSgMxFM34rPVVdekmWARBkBkRdCNUBXHhQsHaQqeUO5lUg5lkSO6IZZg/cuOvdCOiiFv/wvSx8HUgcHLOvTe5J0qlsOj7L97E5NT0zGxprjy/sLi0XFlZvbY6M4zXmZbaNCOwXArF6yhQ8mZqOCSR5I3o7mTgN+65sUKrK+ylvJ3AjRJdwQCd1KmchkmkH/JzbW1BD+n32zYNpRsUAw1ZrJGed/KRHaJQPXoUD+aCESCLolOp+jv+EPQvCcakSsa46FT6YaxZlnCFTIK1rcBPsZ2DQcEkL8phZnkK7A5ueMtRBQm37Xy4b0E3nRLTrjbuKKRD9XtHDom1vSRylQngrf3tDcT/vFaG3YN2LlSaIVds9FA3kxQ1HYRHY2E4Q9lzBJgR7q+U3YIBhi6Ksgsh+L3yX3K9uxM4frlXrR2P4yiRdbJBtkhA9kmNnJELUieMPJI+eSVv3pP37L17H6PSCW/cs0Z+wPv8Ao8xqgM=AAACL3icbVDLSgMxFM34rPVVdekmWARBkBkRdCNUBXHhQsHaQqeUO5lUg5lkSO6IZZg/cuOvdCOiiFv/wvSx8HUgcHLOvTe5J0qlsOj7L97E5NT0zGxprjy/sLi0XFlZvbY6M4zXmZbaNCOwXArF6yhQ8mZqOCSR5I3o7mTgN+65sUKrK+ylvJ3AjRJdwQCd1KmchkmkH/JzbW1BD+n32zYNpRsUAw1ZrJGed/KRHaJQPXoUD+aCESCLolOp+jv+EPQvCcakSsa46FT6YaxZlnCFTIK1rcBPsZ2DQcEkL8phZnkK7A5ueMtRBQm37Xy4b0E3nRLTrjbuKKRD9XtHDom1vSRylQngrf3tDcT/vFaG3YN2LlSaIVds9FA3kxQ1HYRHY2E4Q9lzBJgR7q+U3YIBhi6Ksgsh+L3yX3K9uxM4frlXrR2P4yiRdbJBtkhA9kmNnJELUieMPJI+eSVv3pP37L17H6PSCW/cs0Z+wPv8Ao8xqgM=AAACL3icbVDLSgMxFM34rPVVdekmWARBkBkRdCNUBXHhQsHaQqeUO5lUg5lkSO6IZZg/cuOvdCOiiFv/wvSx8HUgcHLOvTe5J0qlsOj7L97E5NT0zGxprjy/sLi0XFlZvbY6M4zXmZbaNCOwXArF6yhQ8mZqOCSR5I3o7mTgN+65sUKrK+ylvJ3AjRJdwQCd1KmchkmkH/JzbW1BD+n32zYNpRsUAw1ZrJGed/KRHaJQPXoUD+aCESCLolOp+jv+EPQvCcakSsa46FT6YaxZlnCFTIK1rcBPsZ2DQcEkL8phZnkK7A5ueMtRBQm37Xy4b0E3nRLTrjbuKKRD9XtHDom1vSRylQngrf3tDcT/vFaG3YN2LlSaIVds9FA3kxQ1HYRHY2E4Q9lzBJgR7q+U3YIBhi6Ksgsh+L3yX3K9uxM4frlXrR2P4yiRdbJBtkhA9kmNnJELUieMPJI+eSVv3pP37L17H6PSCW/cs0Z+wPv8Ao8xqgM=AAACL3icbVDLSgMxFM34rPVVdekmWARBkBkRdCNUBXHhQsHaQqeUO5lUg5lkSO6IZZg/cuOvdCOiiFv/wvSx8HUgcHLOvTe5J0qlsOj7L97E5NT0zGxprjy/sLi0XFlZvbY6M4zXmZbaNCOwXArF6yhQ8mZqOCSR5I3o7mTgN+65sUKrK+ylvJ3AjRJdwQCd1KmchkmkH/JzbW1BD+n32zYNpRsUAw1ZrJGed/KRHaJQPXoUD+aCESCLolOp+jv+EPQvCcakSsa46FT6YaxZlnCFTIK1rcBPsZ2DQcEkL8phZnkK7A5ueMtRBQm37Xy4b0E3nRLTrjbuKKRD9XtHDom1vSRylQngrf3tDcT/vFaG3YN2LlSaIVds9FA3kxQ1HYRHY2E4Q9lzBJgR7q+U3YIBhi6Ksgsh+L3yX3K9uxM4frlXrR2P4yiRdbJBtkhA9kmNnJELUieMPJI+eSVv3pP37L17H6PSCW/cs0Z+wPv8Ao8xqgM=AAACL3icbVDLSgMxFM34rPVVdekmWARBkBkRdCNUBXHhQsHaQqeUO5lUg5lkSO6IZZg/cuOvdCOiiFv/wvSx8HUgcHLOvTe5J0qlsOj7L97E5NT0zGxprjy/sLi0XFlZvbY6M4zXmZbaNCOwXArF6yhQ8mZqOCSR5I3o7mTgN+65sUKrK+ylvJ3AjRJdwQCd1KmchkmkH/JzbW1BD+n32zYNpRsUAw1ZrJGed/KRHaJQPXoUD+aCESCLolOp+jv+EPQvCcakSsa46FT6YaxZlnCFTIK1rcBPsZ2DQcEkL8phZnkK7A5ueMtRBQm37Xy4b0E3nRLTrjbuKKRD9XtHDom1vSRylQngrf3tDcT/vFaG3YN2LlSaIVds9FA3kxQ1HYRHY2E4Q9lzBJgR7q+U3YIBhi6Ksgsh+L3yX3K9uxM4frlXrR2P4yiRdbJBtkhA9kmNnJELUieMPJI+eSVv3pP37L17H6PSCW/cs0Z+wPv8Ao8xqgM=AAACL3icbVDLSgMxFM34rPVVdekmWARBkBkRdCNUBXHhQsHaQqeUO5lUg5lkSO6IZZg/cuOvdCOiiFv/wvSx8HUgcHLOvTe5J0qlsOj7L97E5NT0zGxprjy/sLi0XFlZvbY6M4zXmZbaNCOwXArF6yhQ8mZqOCSR5I3o7mTgN+65sUKrK+ylvJ3AjRJdwQCd1KmchkmkH/JzbW1BD+n32zYNpRsUAw1ZrJGed/KRHaJQPXoUD+aCESCLolOp+jv+EPQvCcakSsa46FT6YaxZlnCFTIK1rcBPsZ2DQcEkL8phZnkK7A5ueMtRBQm37Xy4b0E3nRLTrjbuKKRD9XtHDom1vSRylQngrf3tDcT/vFaG3YN2LlSaIVds9FA3kxQ1HYRHY2E4Q9lzBJgR7q+U3YIBhi6Ksgsh+L3yX3K9uxM4frlXrR2P4yiRdbJBtkhA9kmNnJELUieMPJI+eSVv3pP37L17H6PSCW/cs0Z+wPv8Ao8xqgM=

    The discriminator/adversarial can also be seen as an addition to loss function, penalising (with λ) an ability to see differences between real and fake:

  • Latent spaceLatent variables are variables that are inferred instead of directly observed. They may correspond to some physical reality, e.g. temperature, (then also called hidden variables) but can also correspond to abstract concepts, e.g. mental state.

    One advantage of using latent variables is that they can serve to reduce the dimensionality of data. Also, latent variables link observable data in the real world to symbolic data in the modelled world.

    A latent space is one spanned by latent variables, thus containing the main features.

    13

  • Example: Latent space for PCA

    14

    Consider a 3 dimensional space on which weapply a PCA analysis.

    Then the principle component will fall in somedirection spanned by the three dimensions.

    If we choose only to use this component, thenthis 1D direction forms the latent space:• All 3D points can be boiled down to this line, and• this line can give an approximation to all 3D points.

    This is a linear example in low dimensionality.Typically, ML-problems are non-linear and in high dimensionality. Therefore, the latent spaces can also have significant dimensionality, though it should of course always have a (much) lower dimensionality than the problem itself.

  • Variable AutoEncoders

    15

    An auto-encoder is a method (typically neural network) to learn efficient data codings in an unsupervised manner (hence the “auto”).This dimensionality reduction is schematically shown below, and closely related to Generative Models.

  • Latent space illustrationThe below animation shows how latent spaces are a simplified representation of the more complex objects, containing the main features of these.

    For this reason, one can do arithmetics (typically interpolate) between the inputs:

    16

  • Latent space illustrationThe below animation shows how latent spaces are a simplified representation of the more complex objects, containing the main features of these.

    For this reason, one can do arithmetics (typically interpolate) between the inputs:

    17

  • GANs producing face imagesIn 2017, Nvidia published the result of their “AI” GANs for producing celebrity faces. There is of course a lot of training data…

    18

  • Evolution in facial GANsThere is quiet a fast evolution in GANs, and their ability to produce realistic results….

    19

  • 20

    Ranking input variables

  • Input Feature RankingIt is of course useful to know, which of your input features/variables are useful, and which are not. Thus a ranking of the features is desired.

    And this is actually a generally nice feature of ML and feature ranking:It works as an automation of the detective work behind finding relations.

    In principle, one could obtain a variables ranking by testing all combinations of variables. But that is not feasible on most situation (N features > 7)…

    Most algorithms have a build-in input feature ranking, which is based on the very simple idea of “permutation importance”.

  • Permutation ImportanceOne of the most used methods is “permutation importance” (below quoting Christoph M.: "Interpretable ML" chapter 5.5). The idea is really simple:

    We measure the importance of a feature by calculating the increase in the model’s loss function after permuting the feature. A feature is “important” if shuffling its values increases the model error, because in this case the model relied on the feature for the prediction.A feature is “unimportant” if shuffling its values leaves the model error unchanged, because the model thus ignored the feature for the prediction.

    https://christophm.github.io/interpretable-ml-book/feature-importance.html

  • Permutation ImportanceOne of the most used methods is “permutation importance” (below quoting Christoph M.: "Interpretable ML" chapter 5.5). The idea is really simple:

    We measure the importance of a feature by calculating the increase in the model’s loss function after permuting the feature. A feature is “important” if shuffling its values increases the model error, because in this case the model relied on the feature for the prediction.A feature is “unimportant” if shuffling its values leaves the model error unchanged, because the model thus ignored the feature for the prediction.

    https://christophm.github.io/interpretable-ml-book/feature-importance.html

  • Permutation ImportanceInput: Trained model f, feature matrix X, target vector y, loss function L(y,f).

    [Fisher, Rudin, and Dominici (2018)]

    • Estimate the original model error eorig = L(y, f(X))• For each feature j = 1,…,p do:

    – Generate feature matrix Xperm by permuting feature j in the data X. This breaks the association between feature j and true outcome y.

    – Estimate error eperm = L(Y,f(Xperm)) based on the predictions of Xperm.– Calculate permutation feature importance FIj= eperm/eorig (or eperm - eorig).

    • Sort features by descending FI.

    Feature importance with Neural Networks (Towards Data Science)

    https://towardsdatascience.com/feature-importance-with-neural-network-346eb6205743

  • Shapley ValuesA better approximation was developed by Scott Lundberg with SHAP values:

    SHAP (SHapley Additive exPlanations):https://github.com/slundberg/shap

    This algorithm provides - for each entry - a ranking of the input variables, i.e. a sort of explanation for the result.

    One can also sum of the SHAP values over all entries, and then get the overall ranking of feature variables. They are based on Shapley values.

    https://github.com/slundberg/shap

  • Shapley valuesShapley values is a concept from corporative game theory, where they are used to provide a possible answer to the question:

    “How important is each player to the overall cooperation,and what payoff can each player reasonably expect?”

    The Shapley values are considered “fair”, as they are the only distribution with the following properties:• Efficiency: Sum of Shapley values of all agents equals value of grand coalition.• Linearity: If two coalition games described by v and w are combined, then the

    distributed gains should correspond to the gains derived from the sum of v and w.• Null player: The Shapley value of a null player is zero.• Stand alone test: If v is sub/super additive, then • Anonymity: Labelling of agents doesn't play a role in assignment of their gains.• Marginalism: Function uses only marginal contributions of player i as arguments.

    From such values, one can determine which variables contribute to a final result. And summing the values, one can get an overall idea of which variables are important.

    'i(v) / � v({i})AAACCnicbZC7TsMwFIZPyq2UW4CRxVAhtUtJEBKMFSyMRaIXqYkix3Vbq84F24lURZ1ZeBUWBhBi5QnYeBvcNgO0HMnyp/8/R/b5/ZgzqSzr2yisrK6tbxQ3S1vbO7t75v5BS0aJILRJIh6Jjo8l5SykTcUUp51YUBz4nLb90c3Ub6dUSBaF92ocUzfAg5D1GcFKS5557KRYxEPmsUpaRQ6nD+gMOQN9pRUnY86k6pllq2bNCi2DnUMZ8mp45pfTi0gS0FARjqXs2las3AwLxQink5KTSBpjMsID2tUY4oBKN5utMkGnWumhfiT0CRWaqb8nMhxIOQ583RlgNZSL3lT8z+smqn/lZiyME0VDMn+on3CkIjTNBfWYoETxsQZMBNN/RWSIBSZKp1fSIdiLKy9D67xma767KNev8ziKcAQnUAEbLqEOt9CAJhB4hGd4hTfjyXgx3o2PeWvByGcO4U8Znz8KTJkwAAACCnicbZC7TsMwFIZPyq2UW4CRxVAhtUtJEBKMFSyMRaIXqYkix3Vbq84F24lURZ1ZeBUWBhBi5QnYeBvcNgO0HMnyp/8/R/b5/ZgzqSzr2yisrK6tbxQ3S1vbO7t75v5BS0aJILRJIh6Jjo8l5SykTcUUp51YUBz4nLb90c3Ub6dUSBaF92ocUzfAg5D1GcFKS5557KRYxEPmsUpaRQ6nD+gMOQN9pRUnY86k6pllq2bNCi2DnUMZ8mp45pfTi0gS0FARjqXs2las3AwLxQink5KTSBpjMsID2tUY4oBKN5utMkGnWumhfiT0CRWaqb8nMhxIOQ583RlgNZSL3lT8z+smqn/lZiyME0VDMn+on3CkIjTNBfWYoETxsQZMBNN/RWSIBSZKp1fSIdiLKy9D67xma767KNev8ziKcAQnUAEbLqEOt9CAJhB4hGd4hTfjyXgx3o2PeWvByGcO4U8Znz8KTJkwAAACCnicbZC7TsMwFIZPyq2UW4CRxVAhtUtJEBKMFSyMRaIXqYkix3Vbq84F24lURZ1ZeBUWBhBi5QnYeBvcNgO0HMnyp/8/R/b5/ZgzqSzr2yisrK6tbxQ3S1vbO7t75v5BS0aJILRJIh6Jjo8l5SykTcUUp51YUBz4nLb90c3Ub6dUSBaF92ocUzfAg5D1GcFKS5557KRYxEPmsUpaRQ6nD+gMOQN9pRUnY86k6pllq2bNCi2DnUMZ8mp45pfTi0gS0FARjqXs2las3AwLxQink5KTSBpjMsID2tUY4oBKN5utMkGnWumhfiT0CRWaqb8nMhxIOQ583RlgNZSL3lT8z+smqn/lZiyME0VDMn+on3CkIjTNBfWYoETxsQZMBNN/RWSIBSZKp1fSIdiLKy9D67xma767KNev8ziKcAQnUAEbLqEOt9CAJhB4hGd4hTfjyXgx3o2PeWvByGcO4U8Znz8KTJkwAAACCnicbZC7TsMwFIZPyq2UW4CRxVAhtUtJEBKMFSyMRaIXqYkix3Vbq84F24lURZ1ZeBUWBhBi5QnYeBvcNgO0HMnyp/8/R/b5/ZgzqSzr2yisrK6tbxQ3S1vbO7t75v5BS0aJILRJIh6Jjo8l5SykTcUUp51YUBz4nLb90c3Ub6dUSBaF92ocUzfAg5D1GcFKS5557KRYxEPmsUpaRQ6nD+gMOQN9pRUnY86k6pllq2bNCi2DnUMZ8mp45pfTi0gS0FARjqXs2las3AwLxQink5KTSBpjMsID2tUY4oBKN5utMkGnWumhfiT0CRWaqb8nMhxIOQ583RlgNZSL3lT8z+smqn/lZiyME0VDMn+on3CkIjTNBfWYoETxsQZMBNN/RWSIBSZKp1fSIdiLKy9D67xma767KNev8ziKcAQnUAEbLqEOt9CAJhB4hGd4hTfjyXgx3o2PeWvByGcO4U8Znz8KTJkw

  • SHAP value calculationConsider a set N (of n players) and a (characteristic or worth) function v that maps any subset of players to real numbers:

    If S is a coalition of players, then v(S) yields the total expected sum of payoffs the members of S can obtain by cooperation.

    The Shapley values are calculated as:

    To formula can be understood, of we imagine a coalition being formed one actor at a time, with each actor demanding their contribution v(S ∪ {i}) − v(S) as a fair compensation, and then for each actor take the average of this contribution over the possible different permutations in which the coalition can be formed.

    v : 2N ! R, v(;) = 0AAACOHicbVDLSgNBEJz1GeMr6tHLYBAURHaDoBgE0YsnXxgVsjHMTjrJkNkHM72BZdn8lRc/w5t48aCIV7/AScxBowUDRVU1PV1eJIVG236yxsYnJqemczP52bn5hcXC0vK1DmPFocJDGapbj2mQIoAKCpRwGylgvifhxusc9/2bLigtwuAKkwhqPmsFoik4QyPVC2ep2xA6kizRmEigXbe875ZLd+lp5mJIXZ9h2/NoepnRbIv2er3R/IYLfoSJBqSbB3ZWLxTtbXsA+pc4Q1IkQ5zXC49uI+SxDwFyybSuOnaEtZQpFFxClndjDRHjHdaCqqEB80HX0sHhGV03SoM2Q2VegHSg/pxIma914nsm2T9Ej3p98T+vGmNzr5aKIIoRAv69qBlLairpt0gbQgFHmRjCuBLmr5S3mWIcTdd5U4IzevJfcl3adgy/2CkeHg3ryJFVskY2iEN2ySE5IeekQji5J8/klbxZD9aL9W59fEfHrOHMCvkF6/MLYyytTw==AAACOHicbVDLSgNBEJz1GeMr6tHLYBAURHaDoBgE0YsnXxgVsjHMTjrJkNkHM72BZdn8lRc/w5t48aCIV7/AScxBowUDRVU1PV1eJIVG236yxsYnJqemczP52bn5hcXC0vK1DmPFocJDGapbj2mQIoAKCpRwGylgvifhxusc9/2bLigtwuAKkwhqPmsFoik4QyPVC2ep2xA6kizRmEigXbe875ZLd+lp5mJIXZ9h2/NoepnRbIv2er3R/IYLfoSJBqSbB3ZWLxTtbXsA+pc4Q1IkQ5zXC49uI+SxDwFyybSuOnaEtZQpFFxClndjDRHjHdaCqqEB80HX0sHhGV03SoM2Q2VegHSg/pxIma914nsm2T9Ej3p98T+vGmNzr5aKIIoRAv69qBlLairpt0gbQgFHmRjCuBLmr5S3mWIcTdd5U4IzevJfcl3adgy/2CkeHg3ryJFVskY2iEN2ySE5IeekQji5J8/klbxZD9aL9W59fEfHrOHMCvkF6/MLYyytTw==AAACOHicbVDLSgNBEJz1GeMr6tHLYBAURHaDoBgE0YsnXxgVsjHMTjrJkNkHM72BZdn8lRc/w5t48aCIV7/AScxBowUDRVU1PV1eJIVG236yxsYnJqemczP52bn5hcXC0vK1DmPFocJDGapbj2mQIoAKCpRwGylgvifhxusc9/2bLigtwuAKkwhqPmsFoik4QyPVC2ep2xA6kizRmEigXbe875ZLd+lp5mJIXZ9h2/NoepnRbIv2er3R/IYLfoSJBqSbB3ZWLxTtbXsA+pc4Q1IkQ5zXC49uI+SxDwFyybSuOnaEtZQpFFxClndjDRHjHdaCqqEB80HX0sHhGV03SoM2Q2VegHSg/pxIma914nsm2T9Ej3p98T+vGmNzr5aKIIoRAv69qBlLairpt0gbQgFHmRjCuBLmr5S3mWIcTdd5U4IzevJfcl3adgy/2CkeHg3ryJFVskY2iEN2ySE5IeekQji5J8/klbxZD9aL9W59fEfHrOHMCvkF6/MLYyytTw==AAACOHicbVDLSgNBEJz1GeMr6tHLYBAURHaDoBgE0YsnXxgVsjHMTjrJkNkHM72BZdn8lRc/w5t48aCIV7/AScxBowUDRVU1PV1eJIVG236yxsYnJqemczP52bn5hcXC0vK1DmPFocJDGapbj2mQIoAKCpRwGylgvifhxusc9/2bLigtwuAKkwhqPmsFoik4QyPVC2ep2xA6kizRmEigXbe875ZLd+lp5mJIXZ9h2/NoepnRbIv2er3R/IYLfoSJBqSbB3ZWLxTtbXsA+pc4Q1IkQ5zXC49uI+SxDwFyybSuOnaEtZQpFFxClndjDRHjHdaCqqEB80HX0sHhGV03SoM2Q2VegHSg/pxIma914nsm2T9Ej3p98T+vGmNzr5aKIIoRAv69qBlLairpt0gbQgFHmRjCuBLmr5S3mWIcTdd5U4IzevJfcl3adgy/2CkeHg3ryJFVskY2iEN2ySE5IeekQji5J8/klbxZD9aL9W59fEfHrOHMCvkF6/MLYyytTw==

    'i(v) =X

    S✓N\{i}

    |S|!(n� |S|� 1)!n!

    [v(S [ {i})� v(S)]AAACX3icbVFNb9NAEF2bj5a0FAMnxGVChJQcUtkIqb0gVXDhhIpK2kqxZa0342TV9drsji1Frv8kNyQu/BPWSQ7QMtJqnt6btx9vs0pJS2H40/MfPHz0eG//yeDg8OnRs+D5i0tb1kbgTJSqNNcZt6ikxhlJUnhdGeRFpvAqu/nU61cNGitL/Y3WFSYFX2qZS8HJUWnQxA031UqmctxM4ANAbOsibS/6nlkk/A5fHEYqpK4txK2MO+ggzg0X7e3F7RDGeur6NJoMu1YPnaQwpzk0Y7eHqKutZQLTnplAbORyRUkajMLjcFNwH0Q7MGK7Ok+DH/GiFHWBmoTi1s6jsKKk5YakUNgN4tpixcUNX+LcQc0LtEm7yaeDt45ZQF4atzTBhv3b0fLC2nWRucmC08re1Xryf9q8pvw0aaWuakIttgfltQIqoQ8bFtKgILV2gAsj3V1BrLiLjtyXDFwI0d0n3weX744jh7++H5193MWxz16zN2zMInbCzthnds5mTLBfnu8deIfeb3/PP/KD7ajv7Twv2T/lv/oDFNSybg==AAACX3icbVFNb9NAEF2bj5a0FAMnxGVChJQcUtkIqb0gVXDhhIpK2kqxZa0342TV9drsji1Frv8kNyQu/BPWSQ7QMtJqnt6btx9vs0pJS2H40/MfPHz0eG//yeDg8OnRs+D5i0tb1kbgTJSqNNcZt6ikxhlJUnhdGeRFpvAqu/nU61cNGitL/Y3WFSYFX2qZS8HJUWnQxA031UqmctxM4ANAbOsibS/6nlkk/A5fHEYqpK4txK2MO+ggzg0X7e3F7RDGeur6NJoMu1YPnaQwpzk0Y7eHqKutZQLTnplAbORyRUkajMLjcFNwH0Q7MGK7Ok+DH/GiFHWBmoTi1s6jsKKk5YakUNgN4tpixcUNX+LcQc0LtEm7yaeDt45ZQF4atzTBhv3b0fLC2nWRucmC08re1Xryf9q8pvw0aaWuakIttgfltQIqoQ8bFtKgILV2gAsj3V1BrLiLjtyXDFwI0d0n3weX744jh7++H5193MWxz16zN2zMInbCzthnds5mTLBfnu8deIfeb3/PP/KD7ajv7Twv2T/lv/oDFNSybg==AAACX3icbVFNb9NAEF2bj5a0FAMnxGVChJQcUtkIqb0gVXDhhIpK2kqxZa0342TV9drsji1Frv8kNyQu/BPWSQ7QMtJqnt6btx9vs0pJS2H40/MfPHz0eG//yeDg8OnRs+D5i0tb1kbgTJSqNNcZt6ikxhlJUnhdGeRFpvAqu/nU61cNGitL/Y3WFSYFX2qZS8HJUWnQxA031UqmctxM4ANAbOsibS/6nlkk/A5fHEYqpK4txK2MO+ggzg0X7e3F7RDGeur6NJoMu1YPnaQwpzk0Y7eHqKutZQLTnplAbORyRUkajMLjcFNwH0Q7MGK7Ok+DH/GiFHWBmoTi1s6jsKKk5YakUNgN4tpixcUNX+LcQc0LtEm7yaeDt45ZQF4atzTBhv3b0fLC2nWRucmC08re1Xryf9q8pvw0aaWuakIttgfltQIqoQ8bFtKgILV2gAsj3V1BrLiLjtyXDFwI0d0n3weX744jh7++H5193MWxz16zN2zMInbCzthnds5mTLBfnu8deIfeb3/PP/KD7ajv7Twv2T/lv/oDFNSybg==AAACX3icbVFNb9NAEF2bj5a0FAMnxGVChJQcUtkIqb0gVXDhhIpK2kqxZa0342TV9drsji1Frv8kNyQu/BPWSQ7QMtJqnt6btx9vs0pJS2H40/MfPHz0eG//yeDg8OnRs+D5i0tb1kbgTJSqNNcZt6ikxhlJUnhdGeRFpvAqu/nU61cNGitL/Y3WFSYFX2qZS8HJUWnQxA031UqmctxM4ANAbOsibS/6nlkk/A5fHEYqpK4txK2MO+ggzg0X7e3F7RDGeur6NJoMu1YPnaQwpzk0Y7eHqKutZQLTnplAbORyRUkajMLjcFNwH0Q7MGK7Ok+DH/GiFHWBmoTi1s6jsKKk5YakUNgN4tpixcUNX+LcQc0LtEm7yaeDt45ZQF4atzTBhv3b0fLC2nWRucmC08re1Xryf9q8pvw0aaWuakIttgfltQIqoQ8bFtKgILV2gAsj3V1BrLiLjtyXDFwI0d0n3weX744jh7++H5193MWxz16zN2zMInbCzthnds5mTLBfnu8deIfeb3/PP/KD7ajv7Twv2T/lv/oDFNSybg==

  • SHAP value calculationConsider a set N (of n players) and a (characteristic or worth) function v that maps any subset of players to real numbers:

    If S is a coalition of players, then v(S) yields the total expected sum of payoffs the members of S can obtain by cooperation.

    The Shapley values can also be calculated as:

    where the sum ranges over all n! orders R of the players and PiR is the set of players in N which precede i in the order R. This has the interpretation:

    v : 2N ! R, v(;) = 0AAACOHicbVDLSgNBEJz1GeMr6tHLYBAURHaDoBgE0YsnXxgVsjHMTjrJkNkHM72BZdn8lRc/w5t48aCIV7/AScxBowUDRVU1PV1eJIVG236yxsYnJqemczP52bn5hcXC0vK1DmPFocJDGapbj2mQIoAKCpRwGylgvifhxusc9/2bLigtwuAKkwhqPmsFoik4QyPVC2ep2xA6kizRmEigXbe875ZLd+lp5mJIXZ9h2/NoepnRbIv2er3R/IYLfoSJBqSbB3ZWLxTtbXsA+pc4Q1IkQ5zXC49uI+SxDwFyybSuOnaEtZQpFFxClndjDRHjHdaCqqEB80HX0sHhGV03SoM2Q2VegHSg/pxIma914nsm2T9Ej3p98T+vGmNzr5aKIIoRAv69qBlLairpt0gbQgFHmRjCuBLmr5S3mWIcTdd5U4IzevJfcl3adgy/2CkeHg3ryJFVskY2iEN2ySE5IeekQji5J8/klbxZD9aL9W59fEfHrOHMCvkF6/MLYyytTw==AAACOHicbVDLSgNBEJz1GeMr6tHLYBAURHaDoBgE0YsnXxgVsjHMTjrJkNkHM72BZdn8lRc/w5t48aCIV7/AScxBowUDRVU1PV1eJIVG236yxsYnJqemczP52bn5hcXC0vK1DmPFocJDGapbj2mQIoAKCpRwGylgvifhxusc9/2bLigtwuAKkwhqPmsFoik4QyPVC2ep2xA6kizRmEigXbe875ZLd+lp5mJIXZ9h2/NoepnRbIv2er3R/IYLfoSJBqSbB3ZWLxTtbXsA+pc4Q1IkQ5zXC49uI+SxDwFyybSuOnaEtZQpFFxClndjDRHjHdaCqqEB80HX0sHhGV03SoM2Q2VegHSg/pxIma914nsm2T9Ej3p98T+vGmNzr5aKIIoRAv69qBlLairpt0gbQgFHmRjCuBLmr5S3mWIcTdd5U4IzevJfcl3adgy/2CkeHg3ryJFVskY2iEN2ySE5IeekQji5J8/klbxZD9aL9W59fEfHrOHMCvkF6/MLYyytTw==AAACOHicbVDLSgNBEJz1GeMr6tHLYBAURHaDoBgE0YsnXxgVsjHMTjrJkNkHM72BZdn8lRc/w5t48aCIV7/AScxBowUDRVU1PV1eJIVG236yxsYnJqemczP52bn5hcXC0vK1DmPFocJDGapbj2mQIoAKCpRwGylgvifhxusc9/2bLigtwuAKkwhqPmsFoik4QyPVC2ep2xA6kizRmEigXbe875ZLd+lp5mJIXZ9h2/NoepnRbIv2er3R/IYLfoSJBqSbB3ZWLxTtbXsA+pc4Q1IkQ5zXC49uI+SxDwFyybSuOnaEtZQpFFxClndjDRHjHdaCqqEB80HX0sHhGV03SoM2Q2VegHSg/pxIma914nsm2T9Ej3p98T+vGmNzr5aKIIoRAv69qBlLairpt0gbQgFHmRjCuBLmr5S3mWIcTdd5U4IzevJfcl3adgy/2CkeHg3ryJFVskY2iEN2ySE5IeekQji5J8/klbxZD9aL9W59fEfHrOHMCvkF6/MLYyytTw==AAACOHicbVDLSgNBEJz1GeMr6tHLYBAURHaDoBgE0YsnXxgVsjHMTjrJkNkHM72BZdn8lRc/w5t48aCIV7/AScxBowUDRVU1PV1eJIVG236yxsYnJqemczP52bn5hcXC0vK1DmPFocJDGapbj2mQIoAKCpRwGylgvifhxusc9/2bLigtwuAKkwhqPmsFoik4QyPVC2ep2xA6kizRmEigXbe875ZLd+lp5mJIXZ9h2/NoepnRbIv2er3R/IYLfoSJBqSbB3ZWLxTtbXsA+pc4Q1IkQ5zXC49uI+SxDwFyybSuOnaEtZQpFFxClndjDRHjHdaCqqEB80HX0sHhGV03SoM2Q2VegHSg/pxIma914nsm2T9Ej3p98T+vGmNzr5aKIIoRAv69qBlLairpt0gbQgFHmRjCuBLmr5S3mWIcTdd5U4IzevJfcl3adgy/2CkeHg3ryJFVskY2iEN2ySE5IeekQji5J8/klbxZD9aL9W59fEfHrOHMCvkF6/MLYyytTw==

    'i(v) =1

    n!

    X

    R

    ⇥v(PRi [ {i})� v(PRi )

    AAACOXicbZDLSgMxFIYzXmu9VV26iRahXSgzIuhGEN24rGJVaMYhk2baYCYzJGcKZZjXcuNbuBPcuFDErS9gehG8/RD48p9zSM4fplIYcN1HZ2JyanpmtjRXnl9YXFqurKxemiTTjDdZIhN9HVLDpVC8CQIkv041p3Eo+VV4ezKoX/W4NiJRF9BPuR/TjhKRYBSsFVQapEd12hWBqPXq+BCTSFOWe0WuNgpMTBYH55hIHkEL92qNQNzYK8tSTHJBijre/nLrmGjR6YIfVKrujjsU/gveGKporEZQeSDthGUxV8AkNabluSn4OdUgmORFmWSGp5Td0g5vWVQ05sbPh5sXeMs6bRwl2h4FeOh+n8hpbEw/Dm1nTKFrftcG5n+1VgbRgZ8LlWbAFRs9FGUSQ4IHMeK20JyB7FugTAv7V8y61GYHNuyyDcH7vfJfuNzd8Syf7VWPjsdxlNA62kQ15KF9dIROUQM1EUN36Am9oFfn3nl23pz3UeuEM55ZQz/kfHwCvSmrCA==AAACOXicbZDLSgMxFIYzXmu9VV26iRahXSgzIuhGEN24rGJVaMYhk2baYCYzJGcKZZjXcuNbuBPcuFDErS9gehG8/RD48p9zSM4fplIYcN1HZ2JyanpmtjRXnl9YXFqurKxemiTTjDdZIhN9HVLDpVC8CQIkv041p3Eo+VV4ezKoX/W4NiJRF9BPuR/TjhKRYBSsFVQapEd12hWBqPXq+BCTSFOWe0WuNgpMTBYH55hIHkEL92qNQNzYK8tSTHJBijre/nLrmGjR6YIfVKrujjsU/gveGKporEZQeSDthGUxV8AkNabluSn4OdUgmORFmWSGp5Td0g5vWVQ05sbPh5sXeMs6bRwl2h4FeOh+n8hpbEw/Dm1nTKFrftcG5n+1VgbRgZ8LlWbAFRs9FGUSQ4IHMeK20JyB7FugTAv7V8y61GYHNuyyDcH7vfJfuNzd8Syf7VWPjsdxlNA62kQ15KF9dIROUQM1EUN36Am9oFfn3nl23pz3UeuEM55ZQz/kfHwCvSmrCA==AAACOXicbZDLSgMxFIYzXmu9VV26iRahXSgzIuhGEN24rGJVaMYhk2baYCYzJGcKZZjXcuNbuBPcuFDErS9gehG8/RD48p9zSM4fplIYcN1HZ2JyanpmtjRXnl9YXFqurKxemiTTjDdZIhN9HVLDpVC8CQIkv041p3Eo+VV4ezKoX/W4NiJRF9BPuR/TjhKRYBSsFVQapEd12hWBqPXq+BCTSFOWe0WuNgpMTBYH55hIHkEL92qNQNzYK8tSTHJBijre/nLrmGjR6YIfVKrujjsU/gveGKporEZQeSDthGUxV8AkNabluSn4OdUgmORFmWSGp5Td0g5vWVQ05sbPh5sXeMs6bRwl2h4FeOh+n8hpbEw/Dm1nTKFrftcG5n+1VgbRgZ8LlWbAFRs9FGUSQ4IHMeK20JyB7FugTAv7V8y61GYHNuyyDcH7vfJfuNzd8Syf7VWPjsdxlNA62kQ15KF9dIROUQM1EUN36Am9oFfn3nl23pz3UeuEM55ZQz/kfHwCvSmrCA==AAACOXicbZDLSgMxFIYzXmu9VV26iRahXSgzIuhGEN24rGJVaMYhk2baYCYzJGcKZZjXcuNbuBPcuFDErS9gehG8/RD48p9zSM4fplIYcN1HZ2JyanpmtjRXnl9YXFqurKxemiTTjDdZIhN9HVLDpVC8CQIkv041p3Eo+VV4ezKoX/W4NiJRF9BPuR/TjhKRYBSsFVQapEd12hWBqPXq+BCTSFOWe0WuNgpMTBYH55hIHkEL92qNQNzYK8tSTHJBijre/nLrmGjR6YIfVKrujjsU/gveGKporEZQeSDthGUxV8AkNabluSn4OdUgmORFmWSGp5Td0g5vWVQ05sbPh5sXeMs6bRwl2h4FeOh+n8hpbEw/Dm1nTKFrftcG5n+1VgbRgZ8LlWbAFRs9FGUSQ4IHMeK20JyB7FugTAv7V8y61GYHNuyyDcH7vfJfuNzd8Syf7VWPjsdxlNA62kQ15KF9dIROUQM1EUN36Am9oFfn3nl23pz3UeuEM55ZQz/kfHwCvSmrCA==

    'i(v) =1

    Nplayers

    X

    C \ i

    marginal contribution of i to coalition C

    number of coalitions excluding i of this sizeAAACn3icbVHbahQxGM6Mp7oeuuqlXkS3QoVSZkTQG6F0LxSUsoLbVjbLkMlmdn+aw5DD0nXIa/kg3vk2ZmYH0dYfAh/fIT/5UtYCrMuyX0l64+at23d27g7u3X/wcHf46PGp1d4wPmVaaHNeUssFKD514AQ/rw2nshT8rLwYt/rZmhsLWn11m5rPJV0qqIBRF6li+IOsqalXUMD++hV+j0llKGvy0JwUDZGlvmxqQTfxghACJtbLohljckAsdxKUt+QAQ+hTW7+kZgmKCsy0cgZK3y7CusJ7sIedjjQV0HHjEPqM8rLkpjX9US3ml0z4Bahll4yaW4HFFr7zEIrhKDvMusHXQd6DEepnUgx/koVmXnLlmKDWzvKsdvOGGgdM8DAg3vKasgu65LMIFZXczpuu34BfRmaBK23iUQ537N+JhkprN7KMTkndyl7VWvJ/2sy76t28AVV7xxXbLqq8aGtqPwsvwHDmxCYCykzshWG2orFrF39kEEvIrz75Ojh9fZhH/OXN6Oi4r2MHPUUv0D7K0Vt0hD6iCZoiljxLjpNPyef0efohPUknW2ua9Jkn6J9Jv/0GYNvPZg==AAACn3icbVHbahQxGM6Mp7oeuuqlXkS3QoVSZkTQG6F0LxSUsoLbVjbLkMlmdn+aw5DD0nXIa/kg3vk2ZmYH0dYfAh/fIT/5UtYCrMuyX0l64+at23d27g7u3X/wcHf46PGp1d4wPmVaaHNeUssFKD514AQ/rw2nshT8rLwYt/rZmhsLWn11m5rPJV0qqIBRF6li+IOsqalXUMD++hV+j0llKGvy0JwUDZGlvmxqQTfxghACJtbLohljckAsdxKUt+QAQ+hTW7+kZgmKCsy0cgZK3y7CusJ7sIedjjQV0HHjEPqM8rLkpjX9US3ml0z4Bahll4yaW4HFFr7zEIrhKDvMusHXQd6DEepnUgx/koVmXnLlmKDWzvKsdvOGGgdM8DAg3vKasgu65LMIFZXczpuu34BfRmaBK23iUQ537N+JhkprN7KMTkndyl7VWvJ/2sy76t28AVV7xxXbLqq8aGtqPwsvwHDmxCYCykzshWG2orFrF39kEEvIrz75Ojh9fZhH/OXN6Oi4r2MHPUUv0D7K0Vt0hD6iCZoiljxLjpNPyef0efohPUknW2ua9Jkn6J9Jv/0GYNvPZg==AAACn3icbVHbahQxGM6Mp7oeuuqlXkS3QoVSZkTQG6F0LxSUsoLbVjbLkMlmdn+aw5DD0nXIa/kg3vk2ZmYH0dYfAh/fIT/5UtYCrMuyX0l64+at23d27g7u3X/wcHf46PGp1d4wPmVaaHNeUssFKD514AQ/rw2nshT8rLwYt/rZmhsLWn11m5rPJV0qqIBRF6li+IOsqalXUMD++hV+j0llKGvy0JwUDZGlvmxqQTfxghACJtbLohljckAsdxKUt+QAQ+hTW7+kZgmKCsy0cgZK3y7CusJ7sIedjjQV0HHjEPqM8rLkpjX9US3ml0z4Bahll4yaW4HFFr7zEIrhKDvMusHXQd6DEepnUgx/koVmXnLlmKDWzvKsdvOGGgdM8DAg3vKasgu65LMIFZXczpuu34BfRmaBK23iUQ537N+JhkprN7KMTkndyl7VWvJ/2sy76t28AVV7xxXbLqq8aGtqPwsvwHDmxCYCykzshWG2orFrF39kEEvIrz75Ojh9fZhH/OXN6Oi4r2MHPUUv0D7K0Vt0hD6iCZoiljxLjpNPyef0efohPUknW2ua9Jkn6J9Jv/0GYNvPZg==AAACn3icbVHbahQxGM6Mp7oeuuqlXkS3QoVSZkTQG6F0LxSUsoLbVjbLkMlmdn+aw5DD0nXIa/kg3vk2ZmYH0dYfAh/fIT/5UtYCrMuyX0l64+at23d27g7u3X/wcHf46PGp1d4wPmVaaHNeUssFKD514AQ/rw2nshT8rLwYt/rZmhsLWn11m5rPJV0qqIBRF6li+IOsqalXUMD++hV+j0llKGvy0JwUDZGlvmxqQTfxghACJtbLohljckAsdxKUt+QAQ+hTW7+kZgmKCsy0cgZK3y7CusJ7sIedjjQV0HHjEPqM8rLkpjX9US3ml0z4Bahll4yaW4HFFr7zEIrhKDvMusHXQd6DEepnUgx/koVmXnLlmKDWzvKsdvOGGgdM8DAg3vKasgu65LMIFZXczpuu34BfRmaBK23iUQ537N+JhkprN7KMTkndyl7VWvJ/2sy76t28AVV7xxXbLqq8aGtqPwsvwHDmxCYCykzshWG2orFrF39kEEvIrz75Ojh9fZhH/OXN6Oi4r2MHPUUv0D7K0Vt0hD6iCZoiljxLjpNPyef0efohPUknW2ua9Jkn6J9Jv/0GYNvPZg==

  • Input Feature RankingHere is an example from SHAP’s github site.

    Clearly, LSTAT and RM are the best variables (whatever they are!).

  • Input Feature RankingHere is an example from particle physics. The blue variables were “known”, but with SHAP we discovered three new quite good variables in data.

  • Input Feature RankingWe could of course just add all variables, but want to stay simple, and training the models, we see that the three extra variables gives most of gain.

  • Using Angular Variables to disentangle

    H → ZZ* → eeee?

    Individuel estimates

    32

    Shapley-values also opens up for the possibility of giving variable rankings for individuel estimates, i.e. the exact reason behind each estimate!

    Above is shown, which factors influenced (and to what degree) the final price estimate (here 6.86 MKr. compared to the actual sales price of 6.35 MKr.)

    This is a really useful tool to have.

  • Bonus slides

    Event as seen by the TRT detector. The occupancy is near 100%, rendering reconstructing void!

    33

  • 34

    X