Top Banner
Investigation Of Performance Problems With Event Detection Systems Ed Roehl, John Cook, Ruby Daamen, and Uwe Mundry Advanced Data Mining International, LLC Greenville, South Carolina
49

Wqtc2013 invest ofperformanceprobswitheds-20130910

Jul 19, 2015

Download

Engineering

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Wqtc2013 invest ofperformanceprobswitheds-20130910

Investigation Of

Performance Problems With

Event Detection Systems

Ed Roehl, John Cook, Ruby Daamen, and Uwe Mundry

Advanced Data Mining International, LLC

Greenville, South Carolina

Page 2: Wqtc2013 invest ofperformanceprobswitheds-20130910

Background

Page 3: Wqtc2013 invest ofperformanceprobswitheds-20130910

Colorado State pilot loop from Project 3086

flow loop

TOC

analyzer Hach “panel” SC, pH, Cl2, turbidity

flow

pump

data acquisition

toxin

ventilation

injection point

injection

pump

flow direction

Page 4: Wqtc2013 invest ofperformanceprobswitheds-20130910

pilot loop results

chlorine residual

response to Aldicarb

chlorine residual response

to Na Cyanide

Page 5: Wqtc2013 invest ofperformanceprobswitheds-20130910

pilot loop results, cont. conductivity

pH

conductivity response to

Na Arsenate

pH response to Na

Cyanide

Page 6: Wqtc2013 invest ofperformanceprobswitheds-20130910

concatenated

results • effects vary by contaminant

& concentration!

NaArsenate Aldicarb NaCN 1080

30-second time steps

Norm

aliz

ed S

ensor

Responses

Runs concatenated: Gray = injection period Concentration increases left to right per toxin

Page 7: Wqtc2013 invest ofperformanceprobswitheds-20130910

Lab Data

Real Data

This slide shows the difference between

test results and reality

Danger!!!

Page 8: Wqtc2013 invest ofperformanceprobswitheds-20130910

Event Detection System (EDS) concept

• Monitor distribution system for contamination “event”

• Not like liquid chromatograph & mass spectrometer - specific compounds not measured

• “Infers” possible contamination event 1. Uses traditional water quality (WQ) parameters: Cl2,

pH, specific conductance, turbidity, TOC

2. “single-site approach” – uses WQ data only from one site

3. Pattern-matches current WQ to historical database of “normal” patterns

4. Pattern = “feature vector”

5. Poor match = anomaly = event ALARM!

• Commercial systems available for years

Page 9: Wqtc2013 invest ofperformanceprobswitheds-20130910

single-site “nearest

neighbors” approach

CL2

vector

track

PH

DCOND

COND DCL2

CL2 DPH

SCADA

event

n-space

historical database

of vectors

nearest neighbor

distance to historical

1

2

3

4 5

new n-dimensional

“feature vector”

• features represent signal variability – scalars = magnitudes

– D’s = velocities, 2 D’s = accelerations

• n-space = n-dimensional feature space – Math calls it a “hyperspace”

• nearest neighbor distance is ”tunable” alarm trigger

Page 10: Wqtc2013 invest ofperformanceprobswitheds-20130910

single-site approach - BIG assumption

• WQ variability caused by real contamination

events is different from variability caused by

normal operations.

– “normal vectors” relegated to limited regions of

n-space!

– “event vectors” appear where normal vectors do

not!

Page 11: Wqtc2013 invest ofperformanceprobswitheds-20130910

Water Research

Foundation Project 4182

“Interpreting Real-Time Online

Monitoring Data for Water Quality

Event Detection”

Page 12: Wqtc2013 invest ofperformanceprobswitheds-20130910

Project 4182

• Goal – improve EDS reliability

– Too many false positives/alarms

– Too many false negatives when testing with

“simulated events”

• How? - incorporate operations and hydraulic

data into EDS

• Technical Approach

1. Determine causes of false positives and negatives

2. Find new approach incorporating operations and

hydraulic data

Page 13: Wqtc2013 invest ofperformanceprobswitheds-20130910

determine causes of false positives & negatives

1. Compile multi-year distribution system data from 5 utilities

– Columbus OH, Greenville SC, Newport News VA, Oklahoma City, Wellford SC

2. Remove obvious errors – Mostly automated using “univariate filters”

– Always risk that real event could look like something that gets removed

– Sensor reliability problems – well known issue

3. Analyze data – Use several methods

– Focus on detecting events within 20 minutes

Page 14: Wqtc2013 invest ofperformanceprobswitheds-20130910

Automated error removal

• Successive filters identify flat-lines, dropouts, improbable

values

• Filter limits based on statistics or inspection

• Manual clean-up sometimes also necessary

Page 15: Wqtc2013 invest ofperformanceprobswitheds-20130910

measured & filtered CL2

• CL2 - frequent dropouts, often full scale

Pum

p S

tation C

L2 (

mg/l)

measured

filtered

1-minute time steps 1/1/05 – 11/16/09

Page 16: Wqtc2013 invest ofperformanceprobswitheds-20130910

Question #1

• Are normal vectors really relegated to

limited regions of n-space?

– Analysis methods

• 3-D scatter plots - visualize where “new”

vectors appear

• n-space accounting – count how often

“new” vectors appear near historical

vectors

Page 17: Wqtc2013 invest ofperformanceprobswitheds-20130910

Utility B - 3 years WQ data

Page 18: Wqtc2013 invest ofperformanceprobswitheds-20130910

Utility C - 2.7 years WQ data

• Smaller COND range than Utility B

Page 19: Wqtc2013 invest ofperformanceprobswitheds-20130910

3-D plots of scalars - Utility B • shows vectors with 3

scalar features

• scalars = parameter magnitudes

• lots of alarms as n-space fills over time

(x,y,z) = COND, PH, CL2 (x,y,z) = CL2,

COND, TURB

Page 20: Wqtc2013 invest ofperformanceprobswitheds-20130910

3-D plots of scalars - Utility C

(x,y,z) = CL2,

COND, TOC

(x,y,z) = COND, PH, CL2

• lots of alarms as n-space fills over time

Page 21: Wqtc2013 invest ofperformanceprobswitheds-20130910

3 D features: D=change over time interval

• Util. C

• 6 &16 min D’s at 32 months

• Large CL2, PH D’s relative to range

• Util. B

• 5-min D’s at 4 months & 3 years

• Large D’s relative to range

Page 22: Wqtc2013 invest ofperformanceprobswitheds-20130910

D symmetry - Utility B 5-minute D’s D D

D D

Page 23: Wqtc2013 invest ofperformanceprobswitheds-20130910

3-D scatter plot analyses - summary

• Normal vectors wander

all over

• D’s large relative to

scalar ranges = high

variability

• After 3 years – many

places left for events &

false alarms

• Real “event” that

appears amid normal

vectors would be

undetectable

– likely because some

contaminants affect

only some parameters

Utility B – scalar and D

vectors after 3 years

6 different D’s

15-min D

ranges > 5 min

Page 24: Wqtc2013 invest ofperformanceprobswitheds-20130910

n-space accounting procedure

1. 2 Utility A sites - 4 years 10-minute data

a. first 70% historical, rest is new

b. coarsely “segment” n-space • scalars - 5 sub-ranges, each

20% of range

• D’s - 6 sub-ranges about mean. 60%, 90%, 100% of populations

c. intersecting sub-ranges form “hypercuboids” (HC)

2. Count historical & new vectors in hypercuboids

count how often “new” vectors appear where historical vectors

Page 25: Wqtc2013 invest ofperformanceprobswitheds-20130910

segment n-space - results

tank

site

• “Combinatorial Explosion” – even with coarse segmentation

- 3 scalars = 5 x 5 x 5 = 125 cuboids

- + D = 125 x 6 x 6 x 6 = 27k hypercuboids

- + 2nd D = 27k x 6 x 6 x 6 = 5.8 million

pump

station

D D2

D D2

matches

decrease

when more

features

used

Page 26: Wqtc2013 invest ofperformanceprobswitheds-20130910

n-space accounting - summary

• Both sites – many “new” vectors appeared in

unpopulated regions.

– Causes false positives/alarms

– Agrees with 3-D scatter plot analyses for other utilities

• Using more parameters / features to improve

event detection causes “combinatorial explosion”

– n-space volume increases exponentially with #features

– Much larger space for new normal vectors to appear

more false positives/alarms

PH

DCOND

COND DCL2

CL2 DPH

vector features

Page 27: Wqtc2013 invest ofperformanceprobswitheds-20130910

Q1 answer + another question

• Question #1 - Are normal vectors really relegated to limited regions on n-space? Answer – appears that normal operations can place vectors anywhere (within practical limits)

• Question #2 - Why? – Need to understand how signals behave!

– Analysis methods • autocorrelation – quantify randomness

• cross-correlation - quantify independence

• others – spectral analysis; nearest neighbor distance accounting; multivariate empirical modeling w/ operational & hydraulic parameters

Page 28: Wqtc2013 invest ofperformanceprobswitheds-20130910

autocorrelation of D’s

• Autocorrelation determines how randomly a signal varies

– compares a signal to a copy of itself

– calculates R statistic at successive time delays

– Results: negligible R’s predominant = random variability ubiquitous

Uti

lity

C

1st valid

correlation

Page 29: Wqtc2013 invest ofperformanceprobswitheds-20130910

cross-correlation of D’s • Cross-correlation matrix – determines relative independence

of changes (D’s) in WQ and operational parameters

– calculates R2 statistic for D signal pairs

– Results: negligible R2’s predominant = independent signal variability ubiquitous

1 time-step

(86 sec)

change

3 time-step

(4.3 min)

change

7 time-step

(10 min)

change

Uti

lity

C

D D

D D

D D D D D D

D D

D D

D D

D D D D D D D D

D D D D D D D D

D D

D D

D D

D D

D D

D D

D D D

D

Page 30: Wqtc2013 invest ofperformanceprobswitheds-20130910

Q2 answer

• Question #1 - Are normal vectors really relegated to limited regions on n-space? Answer – appears that normal operations can place vectors anywhere

• Question #2 - Why? Answer – on a time scale 20

min, WQ signals vary with “apparent” randomness – random – because WQ trends are frequently interrupted

• random upstream mixing of waters having very different WQ’s

• randomly fluctuating flows, some propagated from afar

– “apparent” – because variability is due to “Laws of Physics”, but causes are unknown / unaccounted for by single-site approach

– conventional “lab chemistry” suppressed by ongoing mixing

Page 31: Wqtc2013 invest ofperformanceprobswitheds-20130910

• Blind to what’s going on upstream

• Doesn’t use available explanatory information

flushing

& fires

single-site

What’s happening?

Page 32: Wqtc2013 invest ofperformanceprobswitheds-20130910

Conclusions – single-site approach

• False positives – because normal operations can generate a wide

range of patterns/vectors (within practical limits)

• False negatives – because simulated patterns/vectors are too similar to

normal vectors

• Using site’s local operational parameters - ineffective because most

variability is due to upstream causes

• Single-site approach ineffective where WQ variability is substantial

(probably most places)

– other algorithms would also be ineffective – same data & physics

– low normal variability (beaker like) applications where event would exceed

parameter ranges can be handled by SCADA

Page 33: Wqtc2013 invest ofperformanceprobswitheds-20130910

Multi-Site

Approach

Page 34: Wqtc2013 invest ofperformanceprobswitheds-20130910

“multi-site” approach

• Use upstream data to

“account” for variability at

downstream “target” site

– significant unaccounted

target variability = event

• Upstream sites provide

– WQ boundary conditions

– more relevant operational

parameters

• System-wide coverage by

cascading from WTP

= Tank

= Pump St.

= WTP

17

7

14

1

9

2

5

8

3

16

15

11

6 4

13 12

Circuit 3

Circuit 1

Circuit 2

Circuit 4

10

cascading sites

along circuits

Page 35: Wqtc2013 invest ofperformanceprobswitheds-20130910

COND (mS/cm) TEMP (deg. F)

1-hour time steps (220 days, August to March)

CL2 (mg/l)

PH

CL2

PH

COND

TEMP

upstream WQ boundary conditions

• Trends similar but not identical – because of target site

operations, measurement errors, unknown causes

upstream

flow target

Page 36: Wqtc2013 invest ofperformanceprobswitheds-20130910

multi-site accounting

• Accounting performed by empirical “process models”

– modeling = an accounting of causes of variability

– prediction error = variability that cannot be accounted for

– statistically large prediction error = event

• Modeling approach

– artificial neural networks (ANN)

• very accurate / definitive accounting

– raw signals enhanced to accentuate variability

• (multi-spectral signal decomposition)

Inputs

predicted

DCL2

PH

measured

CL2

yes

keep

monitoring

COND

empirical

process

model

CL2 upstream WQ

upstream

operations

target

operations

Outputs

prediction

error too

BIG?

no

notification

Page 37: Wqtc2013 invest ofperformanceprobswitheds-20130910

ANN multivariate, nonlinear

curve fitting – WTP THMs

no data

fitted nonlinear “response

surface” represents

normal behavior

large prediction error =

deviation from normal

better

conditions?

Page 38: Wqtc2013 invest ofperformanceprobswitheds-20130910

4-site example

• BPS B is “target” site

• Utility has multiple WTPs with different sources

• 1 year 4-min data – first 10 months = training

– last 2 months = test

BPS

A

TANK

A

unmonitored

flows

Q, PSUC, PDIS,

COND, CL2, TEMP

LVL,

COND,

CL2

TANK

B

BPS

B

Q, PSUC, PDIS,

COND, CL2, TEMP

LVL,

COND,

CL2

Page 39: Wqtc2013 invest ofperformanceprobswitheds-20130910

BPS B COND model results

4-minute observations

measured predicted

COND (mS/cm) Training Data

N: 76,148

R2: 0.847

RMSE: 72 mS/cm

Test Data N: 17,296

R2: 0.893

RMSE: 69 mS/cm

Page 40: Wqtc2013 invest ofperformanceprobswitheds-20130910

BPS B CL2 Process Model – training data CL2 (mg/l)

4-minute observations

measured predicted

Test Data N: 11,715

R2: 0.912

RMSE: 0.085 mg/l

Training Data N: 41,894

R2: 0.837

RMSE: 0.085 mg/l nitrification?

drop outs?

Page 41: Wqtc2013 invest ofperformanceprobswitheds-20130910

D’s

• periods shown are 2 days

• measured and predicted D’s (left axes)

• prediction errors and alarm limits (right axes). – alarm limits = error that occurs 0.1% of time (1 / 2.8 days)

CL2

COND PH

error & limits

meas. & pred. deltas

4-minute observations

Page 42: Wqtc2013 invest ofperformanceprobswitheds-20130910

ARMADA

Experimental

Multi-Site EDS

Page 43: Wqtc2013 invest ofperformanceprobswitheds-20130910

ARMADA testbed

• Experimental

• Does both single-site “nearest

neighbor” and multi-site event

detection

• Advanced data visualization for

monitoring processes

Page 44: Wqtc2013 invest ofperformanceprobswitheds-20130910

controls

streaming

data

star

plot

streaming

graphs

nearest

neighbor stats

COND

tracking PH

tracking

CL2

tracking

scalar

tracking

nearest neighbor

distributions

Page 45: Wqtc2013 invest ofperformanceprobswitheds-20130910

streaming graphs - measurements,

predictions, errors, limits

PH Area

CL2 error

measured & predicted CL2

Error

Limits

COND error

CL2 Area

COND Area

PH error

measured & predicted PH

measured & predicted COND

newest oldest

Page 46: Wqtc2013 invest ofperformanceprobswitheds-20130910

4-D tracking of CL2 measured, predicted, error

• vectors = (measured, predicted, prediction error)

• planes = indicate features’ historical range limits

• “flash” – indicates sudden, large changes in track’s

magnitude and direction

a. current time: vectors track below historical CL2 range =

big flash

b. earlier time: error exceeds upper limit = event

c. view [a] and [b] as streaming graphs

planes

large decrease

causes flash

large prediction

error

values below

historical

minimums

[a] [b]

[c]

flash

streaming graphs

measured

predicted

error

rotate for

better view

Page 47: Wqtc2013 invest ofperformanceprobswitheds-20130910

Conclusions – multi-site approach

• Potential big improvement over single-site – understands each site’s process physics

– uses known causes of WQ variability to reduce false positives & negatives

– cases indicate 80-90%+ of target WQ variability can be accounted for

• In research phase - ARMADA “demo” available

• Multi-site’s process models – predict cause-effect – can also use to control WQ in distribution system

• Other reasons to monitor distribution system – control processes to improve WQ at points of delivery

– detect common problems - low CL2, nitrification, line integrity, DBPs

Page 48: Wqtc2013 invest ofperformanceprobswitheds-20130910

Series of Tanks and Pump Stations – Util. A

• CL2 decreases downstream and in tanks

CL2 (

mg/l)

1-minute time steps 1/1/05 – 11/16/09

Pump-A

Tank-A

Pump-B

Tank-B

9 months

Page 49: Wqtc2013 invest ofperformanceprobswitheds-20130910

Thanks for your

attention!

Ed Roehl or John Cook

Advanced Data Mining Intl

[email protected]

864.201.8679

This slide shows the difference between

test results and reality