18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

Post on 27-Jul-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

FeatureSqueezing:DetectingAdversarialExamplesinDeepNeuralNetworks

WeilinXuDavidEvansYanjun Qihttp://www.cs.virginia.edu/yanjun/

DeepLearningisSolvingManyofOurProblems!

Auto-DrivingCar

VoiceAssistant

SpamDetector

MedicalGenomics

2

However,DeepLearningClassifiersareEasilyFooled

3

+

“1”100%confidence

“4”100%=

+ “2”99.9%=

+ “2”83.8%=

BIM

JSMA

CW2

OriginalExample Perturbations Adversarial

Examples

CSzegedy etal.,Intriguing Properties of Deep Neural Networks. In ICLR2014.

ClassifiersUnderAttack:AdversaryAdapts

ACMCCS2016

Actualimages

Recognizedfaces

4

SolutionStrategy

5

Solution Strategy 1: Train a perfect vision model.Infeasibleyet.

Solution Strategy 2: Make it harder to find adversarial examples.Arms race!

Feature Squeezing: A general framework that reduces the searchspace available for an adversary and detects adversarial examples.

Simple,Cheap,Effective!

Roadmap

• FeatureSqueezingDetectionFramework

• FeatureSqueezers• BitDepthReduction• SpatialSmoothing

• DetectionEvaluation• Obliviousadversary• Adaptiveadversary

6

Model f(.)

Model f(.)

Background:MachineLearning

• MachineLearning:learntofindmodels thatcangeneralize fromobserveddatatounseendata

7

Input

X Y

X’Modelf(.)

generalizestoUnseenX’

Output

Trained Deep learning Model

“panda”

Forinstance:

Background:AdversarialExamples

8

+

=

t: “gibbon”

Trained Deep learning Model

y: “panda”

x X 0.007×[𝑛𝑜𝑖𝑠𝑒]

x:originalsample

x’=x+r:adversarialsample

Trained Deep learning Model

x’= x + r

CSzegedy etal.,Intriguing Properties of Deep Neural Networks. In ICLR2014.

Background:AdversarialExamples

9

+

=

t: “gibbon”

Trained Deep learning Model

y: “panda”

x X 0.007×[𝑛𝑜𝑖𝑠𝑒]

Trained Deep learning Model

x’= x + r

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓 𝑥3 − 𝑡 + 𝜆 ∗ Δ 𝑥, 𝑥3Misclassificationterm Distanceterm

Manydifferentvariationsofformulationstosearchforx’fromx,e.g.,

IntriguingPropertyofAdversarialExamples

10

“panda” Trained Deep learning Model

y: “1”

Original X

+

y: “2”

Trained Deep learning Model

Adversarial Example: X + r

Irrelevantfeaturesusedinclassificationtasksarethemajorcauseofadversarialexamples.

IntriguingPropertyofAdversarialExamples

11

“panda” Trained Deep learning Model

y: “1”

Original X

+

y: “2”

Trained Deep learning Model

Adversarial Example: X + r

Trained Deep learning Model

y: “1”

Squeeze(X)

+

y: “1”

Trained Deep learning Model

Squeeze(X + r )

Motivation

• Irrelevantfeaturesusedinclassificationtasksaretherootcauseofadversarialexamples.

• Thefeaturespacesareunnecessarilytoolargeindeeplearningtasks:e.g.rawimagepixels.

• WemayreducethesearchspaceofpossibleperturbationsavailabletoanadversaryusingFeatureSqueezing.

BitDepthReduction12

13

Weilin Xu, David Evans, Yanjun Qi. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. 2018 Network and Distributed System Security Symposium. NDSS2018

14

Weilin Xu, David Evans, Yanjun Qi. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. 2018 Network and Distributed System Security Symposium. NDSS2018

Detection Framework

15

ModelPrediction0

Input

Model

Squeezer1

Prediction1

Legitimate

𝐿< 𝑑<>T

Yes

Adversarial

No

FeatureSqueezer coalesces similar samples intoa single one.• Barelychangelegitimateinput.• Destruct adversarial perturbations.

d1

Detection Framework:MultipleSqueezers

16

ModelPrediction0

Input

Model

Squeezer1

Prediction1𝐿<

max 𝑑<,𝑑A > 𝑇

Yes

Adversarial

No

Legitimate

Model

Squeezer2

Prediction2𝐿<

• Bit Depth Reduction• Spatial Smoothing

d1

d2

BitDepthReduction

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

3-bit

1-bit

8-bit

Reduceto1-bit𝑥E = round(𝑥E×2)/2

Reduceto1-bit𝑥E = round(𝑥E×2)/2

[0.312 0.471……0.157 0.851]X_adv

[0.012 0.571……0.1590.951]

X

Originalvalue

Targetva

lue

17

[0.1.……0.1.]

[0.0.……0.1.]

SignalQuantization

BitDepthReduction

Eliminatingadversarialperturbationswhilepreservingsemantics.

18

Legitimate FGSM BIM CW∞ CW2

1 1 4 2 2

1 1 1 1 1

AccuracywithBitDepthReduction

19

Dataset Squeezer AdversarialExamples(FGSM,BIM,CW∞,DeepFool,CW2,CW0,JSMA)

LegitimateImages

MNISTNone 13.0% 99.43%

1-bit Depth 62.7% 99.33%

ImageNetNone 2.78% 69.70%

4-bit Depth 52.11% 68.00%

Baseline

DistributionofDistance(Prediction,SqueezedPrediction)(MNIST)

Maximum L1 Distance 20

0

200

400

600

800

0.0 0.4 0.8 1.2 1.6 2.0

Num

ber o

f Exa

mpl

es

Legitimate

Adversarial

SpatialSmoothing:MedianFilter

• Replaceapixelwithmedianofitsneighbors.• Effectiveineliminating”salt-and-pepper”noise.

21*Imagefromhttps://sultanofswing90.wordpress.com/tag/image-processing/

3x3MedianFilter

SpatialSmoothing:Non-localMeans

• Replaceapatchwithweightedmeanofsimilarpatches.• Preservemoreedges.

22

𝑝

𝑞A

𝑞< 𝑝3 =P𝑤(𝑝, 𝑞E)×𝑞E

23

Airplane94.4%

Truck99.9%

Automobile56.5%

Airplane98.4%

Airplane99.9%

Ship46.0%

Airplane98.3%

Airplane80.8%

Airplane70.0%

MedianFilter(2*2)

Non-localMeans(13-3-4)

OriginalBIM(L∞)JSMA(L0)

AccuracywithSpatialSmoothing

24

Dataset Squeezer AdversarialExamples(FGSM,BIM,CW∞,DeepFool,CW2,CW0)

LegitimateImages

ImageNet

None 2.78% 69.70%

Median Filter2*2 68.11% 65.40%

Non-localMeans11-3-4 57.11% 65.40%

Baseline

DistributionofDistance(Prediction,SqueezedPrediction)(ImageNet)

Maximum L1 Distance 25

020406080100120140

0.0 0.4 0.8 1.2 1.6 2.0

Legitimate

Adversarial

Num

ber o

f Exa

mpl

es

Adversarialinputs(CW2 attack)

Legitimateinputs

OtherPotentialSqueezers

27

CXie,etal.MitigatingAdversarialEffectsThroughRandomization,ICLR2018.

JBuckman,etal.ThermometerEncoding:OneHotWayToResistAdversarialExamples,ICLR2018.

DMeng andHChen,MagNet:aTwo-ProngedDefenseagainstAdversarialExamples,inCCS2017.

FLiao,etal.DefenseagainstAdversarialAttacksUsingHigh-LevelRepresentationGuidedDenoiser,arXiv 1712.02976.

APrakash,etal.DeflectingAdversarialAttackswithPixelDeflection,arXiv 1801.08926.

• Thermometer Encoding(learnable bit depth reduction)

• Imagedenoising usingbilateralfilter,autoencoder,wavelet,etc.

• Imageresizing

ExperimentalSetup

• DatasetsandModelsMNIST, 7-layer-CNNCIFAR-10, DenseNetImageNet, MobileNet

• Attacks(100examplesforeachattack)• Untargeted:FGSM,BIM,DeepFool• Targeted(Next/Least-Likely): JSMA, Carlini-WagnerL2/L∞/L0

• DetectionDatasets• Abalanceddatasetwithlegitimateexamples.• 50%fortrainingthedetector,theremainingforvalidation.

28

Threat Models

• Oblivious adversary: Theadversaryhasfullknowledgeofthetargetmodel,butisnotawareofthedetector.

• Adaptive adversary: Theadversaryhasfullknowledgeofthetargetmodel and thedetector.

29

Detection Framework:MultipleSqueezers

30

ModelPrediction0

Input

Model

Squeezer1

Prediction1𝐿<

max 𝑑<,𝑑A > 𝑇

Yes

Adversarial

No

Legitimate

Model

Squeezer2

Prediction2𝐿<

• Bit Depth Reduction• Spatial Smoothing

d1

d2

HowtofindTfordetector (MNIST)

Maximum L1 Distance 31

SelectathresholdvaluewithFPR5%.

0

200

400

600

800

0.0 0.4 0.8 1.2 1.6 2.0

Num

ber o

f Exa

mpl

es

Legitimate

Adversarial

Detect Successful Adv. Examples (MNIST)

32

SqueezerL∞Attacks L2 Attacks L0 Attacks

FGSM BIM CW∞ CW2 CW0 JSMA

1-bit Depth 100% 97.9% 100% 100% 55.6% 100%

Median 2*2 73.1% 27.7% 100% 94.4% 82.2% 100%

[Best Single] 100% 97.9% 100% 100% 82.2% 100%

Joint 100% 97.9% 100% 100% 91.1% 100%

Bit Depth Reduction is more effective on L∞ and L2 attacks.

Median Smoothing is more effective on L0 attacks.

Joint detection improves performance.

AggregatedDetectionResults

Dataset Squeezers ThresholdFalse

PositiveRate

DetectionRate(SAEs)

ROC-AUCExcludeFAEs

MNIST BitDepth(1-bit),Median(2x2) 0.0029 3.98% 98.2% 99.44%

CIFAR-10BitDepth(5-bit),Median(2x2),Non-localMean(13-3-2)

1.1402 4.93% 84.5% 95.74%

ImageNetBitDepth(5-bit),Median(2x2),Non-localMean(11-3-4)

1.2128 8.33% 85.9% 94.24%

33

Threat Models

• Oblivious attack: Theadversaryhasfullknowledgeofthetargetmodel,butisnotawareofthedetector.

• Adaptive attack: Theadversaryhasfullknowledgeofthetargetmodel and thedetector.

34

AdaptiveAdversary

AdaptiveCW2 attack,unboundedadversary.

WarrenHe,JamesWei,Xinyun Chen,NicholasCarlini,DawnSong,AdversarialExampleDefense:EnsemblesofWeakDefensesarenotStrong,USENIXWOOT’17.

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓 𝑥3 − 𝑡 + 𝜆 ∗ Δ 𝑥, 𝑥3 +𝑘 ∗ 𝑑𝑒𝑡𝑒𝑐𝑡𝑆𝑐𝑜𝑟𝑒(𝑥′)

35

Misclassificationterm Distanceterm Detectionterm

AdaptiveAdversarialExamples

36

No successful adversarial examples were found for images originally labeled as 3 or 8.

Mean L2

2.80

4.14

4.67

AdaptiveAdversarySuccessRates

37

0.68

0.060.01

0.44

0.01

0.24

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Adv

ersa

ry’s

Suc

cess

Rat

e

Clipped ε

Targeted (Next)

Targeted(LL)

Untargeted

Unbounded

Common 𝜺

CounterMeasure:Randomization

• Binaryfilterthreshold:=0.5threshold:=𝒩 0.5,0.0625

• StrengthentheadaptiveadversaryAttack an ensemble of 3detectors with thresholds:=[0.4,0.5,0.6]

38

0

0.2

0.4

0.6

0.8

1

0 0.5 10

0.2

0.4

0.6

0.8

1

0 0.5 1

39

2.80, Untargeted

4.14, Targeted-Next

4.67, Targeted-LL

3.63, Untargeted

5.48, Targeted-Next

5.76, Targeted-LL

AttackDeterministicDetector Mean L2

AttackRandomizedDetector

Conclusion

• FeatureSqueezinghardens deep learning models.• Feature Squeezing gives advantages to the defense side inthe armsrace with adaptive adversary.

40

Thankyou!ReproduceourresultsusingEvadeML-Zoo:https://evadeML.org/zoo

41

top related