Top Banner
SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1
36

SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Jun 17, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

SEMI-SUPERVISED CLASSIFICATION

WITH GRAPH CONVOLUTIONAL

NETWORKS

Thomas N. Kipf, Max WellingICLR 2017

Presented by Devansh Shah

1

Page 2: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Semi-Supervised Learning

Goal: Learn a better prediction rule than based on labeled data alone2

Page 3: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Why bother?

• Unlabeled data is cheap

• Labeled data can be hard to get

• human annotation is boring

• labels may require experts

3

Page 4: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Can Unlabeled data help?

• Assuming each class is a coherent group (e.g. Gaussian)

• With and without unlabeled data: decision boundary shift

4

Page 5: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Can Unlabeled data help?

“Similar” data points have “similar” labels5

Page 6: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Semi-supervised vs transductive learning

• labeled data (Xl ,Yl) = {(x1:l , y1:l)}• unlabeled data Xu = {xl+1:n}, available during training

• test data Xtest = {xn+1:}, not available during training

Inductive learning is ultimately applied to the test data.

Transductive learning is only concerned with the unlabeled data.

6

Page 7: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Graph Convolutional Networks

7

Page 8: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Applications

• Social Networks

• Protein-Protein Interaction

• 3D Meshes

• Clustering

• Scene Graphs

8

Page 9: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Graph Learning Problem

Inputs:

• graph G = (V ,E )

• A feature description xi for every node i; summarized in a

N × D feature matrix X (N: number of nodes, D: number of

input features)

• Adjacency matrix A

Outputs:

• node-level output Z (an N×F feature matrix, where F is the

number of output features per node)

9

Page 10: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Understanding Graph Neural Networks

Every neural network layer can be written as a non-linear function

H l+1 = f (H l ,A) with

• H0 = X

• HL = Z where L is number of layers

10

Page 11: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Understanding Graph Neural Networks

f (H l ,A) = σ(AH lW l) where

• W l is weight matrix for the l-th layer

• σ(.) is a non-linear activation function like the ReLU

11

Page 12: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Understanding Graph Neural Networks

Limitation I:

• Multiplication with A means that, for every node, we sum up

all the feature vectors of all neighboring nodes but not the

node itself

Fix:

• Enforce self-loop in the graph by adding identity matrix to A

12

Page 13: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Understanding Graph Neural Networks

Limitation II:

• A is typically not normalized and therefore the multiplication

with A will completely change the scale of the feature vectors

Fix:

• Normalize A such that all rows sum to one, i.e. D−1A, where

D is the diagonal node degree matrix. Multiplying with D−1A

now corresponds to taking the average of neighboring node

features

13

Page 14: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Understanding Graph Neural Networks

Propagation Rule: f (H l ,A) = σ(D−0.5AD−0.5H lW l)

• A = A + I , where I is the identity matrix

• D is the diagonal node degree matrix of A

14

Page 15: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Semi-Supervised Node Classification

Cross-Entropy error over all labeled examples

Z = softmax(HL)

Loss = −∑l∈YL

F∑f=1

Ylf lnZlf

• HL is the output of the last layer

• YL is the set of node indices that have labels

• F is the number of distinct output classes

15

Page 16: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Experiments

Datasets

16

Page 17: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Experiments

Baselines

• Label Propagation (LP)

• Semi-Supervised embedding (SemiEmb)

• Manifold regularization (ManiReg)

• skip-gram based graph embeddings (DeepWalk)

• Iterative classification algorithm (ICA)

17

Page 18: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Experiments

Results

18

Page 19: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Robust Graph Convolutional Networks Against

Adversarial Attacks

Dingyuan Zhu, Ziwei Zhang, Peng Cui, Wenwu ZhuACM SIGKDD 2019

Presented by Devansh Shah

19

Page 20: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Adversarial Attacks on Graphs

RELATED WORK

• Adversarial Attack on Graph Structured Data

• Adversarial Attacks on Neural Networks for Graph Data

20

Page 21: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Graph adversarial attack

Transductive Node Classification Setting

• A single graph G0 = (V0,E0) is considered in the entire

dataset

• A target node ci ∈ Vi of graph Gi is associated with a

corresponding node label yi ∈ Y

• Test nodes (but not their labels) are also observed during

training

• D(tra) = {(G0, ci , yi )}Ni=1

21

Page 22: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Graph adversarial attack

Problem DefinitionGiven:

• A learned classifier f

• An instance from the dataset (G , c , y) ∈ D

The graph adversarial attacker g(·, ·) : G × D → G modifies the

graph G = (V ,E ) into G = (V , E ) such that,

maxG

1(f (G , c) 6= y)

s.t. G = g(f , (G , c , y))

Eq(G , G , c) = 1

Here Eq(·, ·, ·) : G × G × V → {0, 1} is an equivalency indicator

that tells whether two graphs G and G are semantically equivalent 22

Page 23: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Graph adversarial attack

23

Page 24: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Robust Graph Convolutional Network (RGCN)

Crux of the paper

• Instead of representing nodes as vectors, they are represented

as Gaussian distributions in each convolutional layer

• When the graph is attacked, the model can automatically

absorb the effects of adversarial changes in the variances of

the Gaussian distributions

• To remedy the propagation of adversarial attacks in GCNs,

variance-based attention mechanism is used when performing

convolutions

24

Page 25: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Gaussian-based Graph Convolution Layer

Latent representation of node vi in layer l

hli = N (µli , diag(σli ))

µli ∈ Rfl is the mean vector

diag(σli )) ∈ Rfl×fl is the diagonal variance matrix

Notation:

M l = [µl1, ..., µN1 ] ∈ RN×fl is the mean matrix

Covl = [σl1, ..., σN1 ] ∈ RN×fl is the variance matrix

25

Page 26: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

RGCN

26

Page 27: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

RGCN

TheoremIf xi ∼ N (µi , diag(σi )) i = 1, ...n and they are independent, then

for any fixed weights wi , we have:

n∑i=1

wixi ∼ N (n∑

i=1

wiµi , diag(n∑

i=1

w2i σi ))

27

Page 28: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

RGCN Node Aggregation

To prevent the propagation of adversarial attacks in GCNs, we

propose an attention mechanism to assign different weights to

neighbors based on their variances since larger variances indicate

more uncertainties in the latent representations and larger

probability of having been attacked

αlj = exp(−γσlj )

Here αlj are the attention weights of node vj in the layer l and γ is

a hyper-parameter

28

Page 29: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

RGCN Node Aggregation

µl+1i = ReLU(

∑j∈ne(i)

1√Di ,i Dj ,j

(µlj � αlj)W

lµ)

σl+1i = ReLU(

∑j∈ne(i)

1

Di ,i Dj ,j

(σlj � αlj � αl

j)Wlσ)

29

Page 30: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Loss Functions

Considering that the hidden representations of our method are

Gaussian distributions, we first adopt a sampling process in the last

hidden layer

zi ∼ N (µLi , diag(σLi ))

Next zi is passed to a softmax function to get the predicted labels:

Y = softmax(Z ),Z = [z1, ..., zn]

Lcls is the cross-entropy loss between the actual labels and the

predicted probabilities for the labelled nodes

30

Page 31: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Loss Functions

To ensure that the learned representations are indeed Gaussian

distributions, we use an explicit regularization to constrain the

latent representations in the first layer as follows

Lreg1 =n∑

i=1

KL(N (µi , diag(σi ))||N (0, I ))

where KL(·||·) is the KL-divergence between two distributions

We also impose L2 regularization on parameters of the first layer as

follows:

Lreg2 =∥∥∥W (0)

µ

∥∥∥22

+∥∥∥W (0)

σ

∥∥∥22

31

Page 32: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Loss Functions

L = Lcls + β1Lreg1 + β2Lreg2

where β1 and β2 are hyper-parameters that control the impact of

different regularizations

32

Page 33: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Results

Node Classification on Clean Datasets

RGCN slightly outperforms the baseline methods on Pubmed,

while having comparable performance on Cora and Citeseer

33

Page 34: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Results

Against Non-targeted Adversarial Attacks

34

Page 35: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Results

Against Targeted Adversarial Attacks

35

Page 36: SEMI-SUPERVISED CLASSIFICATION WITH GRAPH ...SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS Thomas N. Kipf, Max Welling ICLR 2017 Presented by Devansh Shah 1 Semi-Supervised

Thank You!

35