Wind Farm Power prediction with Graph Neural Network · Physics-induced Attention 31 Figure source

Wind Farm Power prediction with Graph Neural Network

Junyoung Park

SYSTEMS INTELLIGENCE Lab

Industrial and Systems Engineering (ISysE)

<https://wall.alphacoders.com/big.php?i=526859>

Wind Farm Power Estimation Task

2

Wind direction 1 Wind direction 2


3

Wind direction 1

• Farm-level power estimation

Wind-farm power = ??

• Turbine-level power estimation

Wind turbine powers = ??


4

Wind direction 1

• Farm-level power estimation

Wind-farm power = ??

• Turbine-level power estimation

Wind turbine powers = ??

Wind Farm and Its Graph Representation

5

Wind direction 1

𝒢 = (𝑁, 𝐸, 𝑔)

Node features 𝑁 = 𝑓𝑟𝑒𝑒 𝑓𝑙𝑜𝑤 𝑤𝑖𝑛𝑑 𝑠𝑝𝑒𝑒𝑑 ∀𝑖 ∈ 𝑡𝑢𝑟𝑏𝑖𝑛𝑒 𝑖𝑛𝑑𝑒𝑥

Edge features 𝐸 = the down−stream wake distance 𝑑,

the radial−wake distance 𝑟 ∀ 𝑖,𝑗 ∗

Global features 𝑔 = {𝑓𝑟𝑒𝑒 𝑓𝑙𝑜𝑤 𝑤𝑖𝑛𝑑 𝑠𝑝𝑒𝑒𝑑}

𝑖, 𝑗 𝑎𝑟𝑒 𝑡𝑢𝑟𝑏𝑖𝑛𝑒 𝑖𝑛𝑑𝑒𝑥

∗ ∀ 𝑖, 𝑗 ∈ 𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑡𝑢𝑟𝑏𝑖𝑛𝑒𝑠

Details on Edge Features

6

Wind direction 1

Edge features 𝐸 = the down−stream wake distance 𝑑,

the radial−wake distance 𝑟 ∀ 𝑖,𝑗 ∗

𝒢 = (𝑁, 𝐸, 𝑔)

Neural Network in EXTREMELY High Level View

7

Input data

𝑦

𝑥

ො𝑦 = 𝑁𝑒𝑢𝑟𝑎𝑙𝑁𝑒𝑡𝑤𝑜𝑟𝑘(𝑥; 𝜽)

Neural network is a function approximator that has trainable parameter 𝜽

such 𝑦 ≈ ො𝑦 as accurate as possible

Why Graph Representation?

8

Wind direction 1

𝒢 = (𝑁, 𝐸, 𝑔)

vs.

Matrix (Tensor) Representations

X coord. Y coord.

T0 850 713

T1 303 587

T2 569 775

T3 642 290

T4 217 97

#. Turb

ines

Why Graph Representation?

9

X coord. Y coord.

T0 850 713

T1 303 587

T2 569 775

T3 642 290

T4 217 97

#. Turb

ines

1. MLP/CNN’s input size tends to be fixed.

e.g.) MNSIT = [28 X 28]

If we deploy one more turbine to the farm,

then the input dimension would change

2. Input data has no natural order.

e.g.) time-series has time index!

Which turbine should be the first input?

Spatial/Temporal Adjacency does not imply ‘related’

10

Convolution operation presumes that

‘Nearby pixels are somewhat related’.

Since we share the convolution filters

Figure source <Left: https://github.com/vdumoulin/conv_arithmetic>, <Right: https://towardsdatascience.com/illustrated-guide-to-recurrent-neural-networks-79e5eb8049c9>

RNNs presumes that

‘Nearby inputs are somewhat related’.

Since we share the RNN blocks.

Graph Neural Network

11

Image source <https://becominghuman.ai/lets-build-a-simple-neural-net-f4474256647f?gi=743618029571>

𝑥3

1

0

24

3

𝑥1

𝑥0

𝑥2

𝑥4

𝑦3

1

0

24

3

𝑦1

𝑦0

𝑦2

𝑦4

- Graph Convolution Networks (GCN)- Attention based approaches- Relational inductive bias (GN block)- …

(or tensors)

𝒢𝑦 = 𝐺𝑟𝑎𝑝ℎ𝑁𝑒𝑢𝑟𝑎𝑙𝑁𝑒𝑡𝑤𝑜𝑟𝑘 (𝒢𝑥; 𝜃)

Imposing Relational Inductive Bias

12

Share edge update function 𝑓 and node update function 𝑔for updating graph represented data

Node update function

𝑓 ∙ Edge update function

𝑔 ∙

1

0

24

3

𝑛1

𝑛0

𝑒0,1

𝑓 ∙

1

0

24

3

𝑒′0,1

Input Graph Updated Graph


13




𝑔 ∙

1

0

24

3

𝑛0

𝑛4

𝑒0,4𝑓 ∙ 1

0

24

3

𝑒′0,1

𝑒′0,4


1

0

24

3

1

0

24

3


14




𝑔 ∙


1

0

24

3

𝑛1

𝑒1,4

𝑒1,2

1

0

24

3

𝑛′1

𝑔 ∙


15




𝑔 ∙



16




𝑔 ∙


1

0

24

3

𝑛0

𝑒0,1

𝑒1,2

𝑒0,2 𝑒0,4 1

0

24

3

𝑛′0𝑔 ∙

Physics-induced Graph Neural Network On Wind Power Estimations

17

GN (Graph Neural) Block

18

𝑔

Input graph 𝒢

Global features

𝑁0

𝑁1

𝑁2

𝑁4

𝑁3

𝑁𝑜𝑑𝑒0 features

𝐸𝑑𝑔𝑒0,1 features

𝑔′

Update graph 𝒢′

Global features

𝑁0′

𝑁1′

𝑁2’

𝑁4’

𝑁3′

𝑁𝑜𝑑𝑒′0 features

𝐸𝑑𝑔𝑒′0,1 features

Graph Neural (GN) Block

𝑓(∙: 𝜃0)

𝑓(∙: 𝜃1)

𝑓(∙: 𝜃2)

Edge update

network

Node update

network

Global update

network

GN Block – Edge update steps

19

𝑔

Input graph 𝒢

Global features

𝑁0

𝑁1

𝑁2

𝑁4

𝑁3


𝐸𝑑𝑔𝑒0,1 0 features

𝐸𝑑𝑔𝑒′0,1

= 𝑓(𝐸𝑑𝑔𝑒0,1 𝑁𝑜𝑑𝑒1, 𝑁𝑜𝑑𝑒0, 𝑔; 𝜃0)

Update edge features with 𝑓(𝐸𝑑𝑔𝑒 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠, 𝑅𝑒𝑐𝑖𝑒𝑣𝑒𝑟 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠, 𝑆𝑒𝑛𝑑𝑒𝑟 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠, 𝑔; 𝜃0)

𝑔


20

𝑔

Input graph 𝒢

Global features

𝑁0

𝑁1

𝑁2

𝑁4

𝑁3




= 𝑓(𝐸𝑑𝑔𝑒4,1 𝑁𝑜𝑑𝑒4, 𝑁𝑜𝑑𝑒1, 𝑔; 𝜃0)


𝑔


21

𝑔

Input graph 𝒢

Global features

𝑁0

𝑁1

𝑁2

𝑁4

𝑁3




𝑔

Updated edge features

GN Block – Node update steps

22

𝑔

Input graph 𝒢 Updated edge features

Global features

𝑁0

𝑁1

𝑁2

𝑁4

𝑁3


𝐸𝑑𝑔𝑒0,1 0 features𝑁0

𝑁𝑜𝑑𝑒′0 = 𝑓(𝐸𝑑𝑔𝑒0; 𝜃1)

𝐸𝑑𝑔𝑒0 = mean concat(𝐸𝑑𝑔𝑒0,𝑖 , 𝑁𝑜𝑑𝑒1, 𝑁𝑜𝑑𝑒𝑖)

∀𝑖 𝑖𝑛𝑐𝑜𝑚𝑖𝑛𝑔 𝑒𝑑𝑔𝑒𝑠

𝑔 𝑔

Aggregation function: any function obeys ‘input-order invariant’ and ‘input-number invariant’ properties.e.g., Mean, Max, Min, etc.

GN Block – Node update steps

23

𝑔


Global features

𝑁0

𝑁1

𝑁2

𝑁4

𝑁3



𝑔 𝑔

GN Block – Global feature update

24

𝑔


Global features

𝑁0

𝑁1

𝑁2

𝑁4

𝑁3



𝑔 𝑔′

𝑔′ = 𝑓(𝐸𝑑𝑔𝑒′, 𝑁𝑜𝑑𝑒’,𝑔; 𝜃2)

𝐸𝑑𝑔𝑒’= mean 𝐸𝑑𝑔𝑒′𝑖,𝑗 ∀𝑒𝑑𝑔𝑒𝑠 𝑖, 𝑗

𝑁𝑜𝑑𝑒’= mean 𝑁𝑜𝑑𝑒𝑖 ∀ 𝑛𝑜𝑑𝑒𝑠 𝑖

Revisit Aggregation Method

25

𝑔


Global features

𝑁0

𝑁1

𝑁2

𝑁4

𝑁3



𝑁𝑜𝑑𝑒′0 = 𝑓(𝐸𝑑𝑔𝑒0; 𝜃1)

𝐸𝑑𝑔𝑒0 = mean concat(𝐸𝑑𝑔𝑒0,𝑖 , 𝑁𝑜𝑑𝑒1, 𝑁𝑜𝑑𝑒𝑖)

∀𝑖 𝑖𝑛𝑐𝑜𝑚𝑖𝑛𝑔 𝑒𝑑𝑔𝑒𝑠

𝑔 𝑔

Aggregation function: any function obeys ‘input-order invariant’ and ‘input-number invariant’ properties.e.g., Mean, Max, Min, etc.

Weighted “__” ≈ Attention (in Deep Learning)

26

Figure source <Agile Amulet: Real-Time Salient Object Detection with Contextual Attention>

Consider weighted Aggregations

27

Figure source <Left: https://www.youtube.com/watch?v=HHlN0TDgllE> , <Right: VAIN: Attentional Multi-agent Predictive Modeling>

<Robot soccer> <Visualized weights>

How can we get the weights?

28

Learn to weight!

GN Block – Edge update steps Revisit

29

𝑔

Input graph 𝒢

Global features

𝑁0

𝑁1

𝑁2

𝑁4

𝑁3




= 𝑊4,1 × 𝑓(𝐸𝑑𝑔𝑒4,1 𝑁𝑜𝑑𝑒4, 𝑁𝑜𝑑𝑒1, 𝑔; 𝜃0)


𝑔

𝑊4,1 = 𝑓(𝑠𝑜𝑚𝑒 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑖𝑛𝑝𝑢𝑡𝑠; 𝜃3)

Physics-induced Attention

30

Figure source <Cooperative wind turbine control for maximizing wind farm power using sequential convex programming by Jinkyoo Park, Kincho H.Law >

JK park, and K.H. law suggest the continuous deficit factor δu(d, r, α) as

δu d, r = 2α𝑅0

𝑅0+𝜅𝑑

2exp −

𝑟

𝑅0+𝜅𝑑

2

𝑅0: 𝑅𝑜𝑡𝑜𝑟 𝑑𝑖𝑎𝑚𝑒𝑡𝑒𝑟𝑑: Down−stream wake distance𝑟: 𝑅𝑎𝑑𝑖𝑎𝑙 𝑤𝑎𝑘𝑒 − 𝑑𝑖𝑠𝑡𝑛𝑎𝑛𝑐𝑒𝛼, 𝜅: 𝑇𝑢𝑛𝑎𝑏𝑙𝑒 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠


31

Figure source <Cooperative wind turbine control for maximizing wind farm power using sequential convex programming by Jinkyoo Park, Kincho H.Law >

δu d, r = 2α𝑅0

𝑅0+𝜅𝑑

2exp −

𝑟

𝑅0+𝜅𝑑

2

δu d, r indicates ‘How much the down stream turbine get affected

Due to the upstream turbines’→ Weighting Factor 𝑊!

However, they tuned the parameters 𝛼, 𝜅 to the observed data


32

𝑔

Input graph 𝒢

Global features

𝑁0

𝑁1

𝑁2

𝑁4

𝑁3





Let neural network learn 𝛼, 𝜅, 𝑅0!

𝑔

𝑊4,1 = 𝑓(𝑠𝑜𝑚𝑒 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑖𝑛𝑝𝑢𝑡𝑠; 𝜃3)

𝑓(𝑠𝑜𝑚𝑒 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑖𝑛𝑝𝑢𝑡𝑠; 𝜃3)

𝑓(𝑟, 𝑑; 𝛼, 𝜅, 𝑅0)

= 2α𝑅0

𝑅0 + 𝜅𝑑

2

exp −𝑟

𝑅0 + 𝜅𝑑

2


33


34

Graph Dense Layer

35

𝑁′0

𝑔′

Graph Dense Layer

𝑓(∙: 𝜃5)Prediction

network

𝑃0

𝑃1

𝑃4

𝑃2

𝑃3

𝑃0 = 𝑓(𝑁′0; 𝜃5)

Graph Dense Layer

36

𝑁′0

𝑔′

𝑃0

𝑁1′

𝑁4′

𝑁2′

𝑁5′

𝑃0 = 𝑓(𝑁′0; 𝜃5)

𝑃0

𝑃1

𝑃4

𝑃2

𝑃3

Graph Dense Layer

37

𝑁′0

𝑔′

𝑃0 = 𝑓(𝑁′0; 𝜃5)

𝑃0

𝑃1

𝑃4

𝑃2

𝑃3

Graph Dense Layer

38

𝑁′0

𝑔′

𝑃0 = 𝑓(𝑁′0; 𝜃5)

How to train your PGNN

39

𝑃0

𝑃1

𝑃4

𝑃2

𝑃3

𝑃

𝑃0

𝑃1

𝑃4

𝑃2

𝑃3

𝑃2

We use mean-squared-error as a loss function of PGNN

Lovely but Dreadful Exponential functions

40

𝑓 𝑥 = exp(𝑥)

Numerical under-flow Numerical over-flow

Simple approximation for exponential functions

41

exp 𝑥 ≔

𝑘=0

∞𝑥𝑘

𝑘!

≈

𝑘=0

D𝑥𝑘

𝑘!

We set D = 5

Bottom side of power-series approximation

42

Question? “why don’t you use Taylor's expansion?”Answer:“You may encounter exponential again!”

The suggested approximation works

(relatively) properly when 𝑥 is small.

Scale-only normalization

43

Instead of using raw the down stream distance 𝑑, and the radial wake distance 𝑟 as inputs,

𝑑′ =𝑑

𝜎(𝑑)×max(0, 𝑠𝑑) 𝑟′ =

𝑑

𝜎(𝑟)×max(0, 𝑠𝑟)

𝑠𝑑 , 𝑠𝑟 are learnable parameters

Dissect Scale-only normalization

44

Instead of using raw the down stream distance 𝑑, and the radial wake distance 𝑟 as inputs,

𝑑′ =𝑑

𝜎(𝑑)×max(0, 𝑠𝑑)

(1) (2)

(3)(4)

(1) Why do not subtract means?

→ We want the scaled values to be positive

(2) What are max(0, 𝑠) for?

→ Since 𝑠’s are learnable parameters, w/o max(0, 𝑠) could be negative

(3) How do you get 𝜎(∙)?

→ We employed EWMA to get 𝜇(∙), 𝜎(∙) estimation

(4) Why do you multiply max(0, 𝑠) again?

→ If not scaling was the best, then we can recover the original values.

Same intuition Batch Normalization did.

Approximated weighting function

45

Scale-norm

𝐴𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑒𝑑f𝑤(∙ ; 𝜃3)

downstream-wake distance d

radial-wake distance r

Normalized d

Normalized rWeight 𝑤

Training Procedure

46

Sample wind-farm layout

Wind speed S

Wind direction 𝜃

𝑃0 𝑃1

𝑃2

𝑃4𝑃3

Power simulations with FLORIS

Graphrepresentation

𝑃

Graphencoding

PGNNSimulator

𝑃

MSE

# turbines 𝑛

sample 𝑠 ~ 𝑈 5.0𝑚/𝑠, 15.0𝑚/𝑠 , 𝜃 ~ 𝑈(0°, 360°)

𝑛 = {5,10,15,20}

Generalization Tests

47

Wind speed 𝑆Generalization over environmental factors- wind directions, wind speedsGeneralization over wind farm layouts

Wind direction 𝜃

Wind farm layouts

wind farm layouts

Generalization Over Environmental Factors

48

Error = 0.0172 Error = 0.022

Wind speed = 8.0 m/s

Generalization Over Layouts

49

- Sample 20 wind farm layouts and Estimate average estimation errors.- Each layout has 20 wind turbines in it.

Qualitative Analysis on Physics-induced Bias

50

𝑔



𝑊4,1 = 𝑓(𝑖𝑛𝑝𝑢𝑡𝑠; 𝜃3)

𝑓(𝑖𝑛𝑝𝑢𝑡𝑠; 𝜃3)

𝑓(𝑟, 𝑑; 𝛼, 𝜅, 𝑅0)

= 2α𝑅0

𝑅0 + 𝜅𝑑

2

exp −𝑟

𝑅0 + 𝜅𝑑

2

𝑓 is another neural network

DGNN

PGNN

Qualitative Analysis on Physics-induced Bias

51

PGNN achieved 11% smaller validation error than DGNN

Training data

Out-of-distribution

Case Study on Inferred Weights

52

Weight values Ignored edges

Case Study on a Regularized Grid Layout

53

Error = 0.0642 Error = 0.0702

Anyway the wind blows

Junyoung Park

SYSTEMS INTELLIGENCE Lab

Industrial and Systems Engineering (ISysE)

<https://wall.alphacoders.com/big.php?i=526859>

Normalizing powers

55

Wind Farm Power prediction with Graph Neural Network · Physics-induced Attention 31 Figure source

Documents