18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

FeatureSqueezing:DetectingAdversarialExamplesinDeepNeuralNetworks

WeilinXuDavidEvansYanjun Qihttp://www.cs.virginia.edu/yanjun/

DeepLearningisSolvingManyofOurProblems!

Auto-DrivingCar

VoiceAssistant

SpamDetector

MedicalGenomics

However,DeepLearningClassifiersareEasilyFooled

“1”100%confidence

“4”100%=

+ “2”99.9%=

+ “2”83.8%=

OriginalExample Perturbations Adversarial

Examples

CSzegedy etal.,Intriguing Properties of Deep Neural Networks. In ICLR2014.

ClassifiersUnderAttack:AdversaryAdapts

ACMCCS2016

Actualimages

Recognizedfaces

SolutionStrategy

Solution Strategy 1: Train a perfect vision model.Infeasibleyet.

Solution Strategy 2: Make it harder to find adversarial examples.Arms race!

Feature Squeezing: A general framework that reduces the searchspace available for an adversary and detects adversarial examples.

Simple,Cheap,Effective!

Roadmap

• FeatureSqueezingDetectionFramework

• FeatureSqueezers• BitDepthReduction• SpatialSmoothing

• DetectionEvaluation• Obliviousadversary• Adaptiveadversary

Model f(.)

Background:MachineLearning

• MachineLearning:learntofindmodels thatcangeneralize fromobserveddatatounseendata

X’Modelf(.)

generalizestoUnseenX’

Output

Trained Deep learning Model

“panda”

Forinstance:

Background:AdversarialExamples

t: “gibbon”

y: “panda”

x X 0.007×[𝑛𝑜𝑖𝑠𝑒]

x:originalsample

x’=x+r:adversarialsample

x’= x + r

CSzegedy etal.,Intriguing Properties of Deep Neural Networks. In ICLR2014.

Background:AdversarialExamples

t: “gibbon”

y: “panda”

x X 0.007×[𝑛𝑜𝑖𝑠𝑒]

x’= x + r

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓 𝑥3 − 𝑡 + 𝜆 ∗ Δ 𝑥, 𝑥3Misclassificationterm Distanceterm

Manydifferentvariationsofformulationstosearchforx’fromx,e.g.,

IntriguingPropertyofAdversarialExamples

“panda” Trained Deep learning Model

y: “1”

Original X

y: “2”

Adversarial Example: X + r

Irrelevantfeaturesusedinclassificationtasksarethemajorcauseofadversarialexamples.

IntriguingPropertyofAdversarialExamples

“panda” Trained Deep learning Model

y: “1”

Original X

y: “2”

Adversarial Example: X + r

y: “1”

Squeeze(X)

y: “1”

Squeeze(X + r )

Motivation

• Irrelevantfeaturesusedinclassificationtasksaretherootcauseofadversarialexamples.

• Thefeaturespacesareunnecessarilytoolargeindeeplearningtasks:e.g.rawimagepixels.

• WemayreducethesearchspaceofpossibleperturbationsavailabletoanadversaryusingFeatureSqueezing.

BitDepthReduction12

Weilin Xu, David Evans, Yanjun Qi. Feature Squeezing: Detecting Adversarial Examples in Deep Neural Networks. 2018 Network and Distributed System Security Symposium. NDSS2018

Detection Framework

ModelPrediction0

Squeezer1

Prediction1

Legitimate

𝐿< 𝑑<>T

Adversarial

FeatureSqueezer coalesces similar samples intoa single one.• Barelychangelegitimateinput.• Destruct adversarial perturbations.

Detection Framework:MultipleSqueezers

ModelPrediction0

Squeezer1

Prediction1𝐿<

max 𝑑<,𝑑A > 𝑇

Adversarial

Legitimate

Squeezer2

Prediction2𝐿<

• Bit Depth Reduction• Spatial Smoothing

BitDepthReduction

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Reduceto1-bit𝑥E = round(𝑥E×2)/2

[0.312 0.471……0.157 0.851]X_adv

[0.012 0.571……0.1590.951]

Originalvalue

Targetva

[0.1.……0.1.]

[0.0.……0.1.]

SignalQuantization

BitDepthReduction

Eliminatingadversarialperturbationswhilepreservingsemantics.

Legitimate FGSM BIM CW∞ CW2

1 1 4 2 2

1 1 1 1 1

AccuracywithBitDepthReduction

Dataset Squeezer AdversarialExamples(FGSM,BIM,CW∞,DeepFool,CW2,CW0,JSMA)

LegitimateImages

MNISTNone 13.0% 99.43%

1-bit Depth 62.7% 99.33%

ImageNetNone 2.78% 69.70%

4-bit Depth 52.11% 68.00%

Baseline

DistributionofDistance(Prediction,SqueezedPrediction)(MNIST)

Maximum L1 Distance 20

0.0 0.4 0.8 1.2 1.6 2.0

Legitimate

Adversarial

SpatialSmoothing:MedianFilter

• Replaceapixelwithmedianofitsneighbors.• Effectiveineliminating”salt-and-pepper”noise.

21*Imagefromhttps://sultanofswing90.wordpress.com/tag/image-processing/

3x3MedianFilter

SpatialSmoothing:Non-localMeans

• Replaceapatchwithweightedmeanofsimilarpatches.• Preservemoreedges.

𝑞< 𝑝3 =P𝑤(𝑝, 𝑞E)×𝑞E

Airplane94.4%

Truck99.9%

Automobile56.5%

Airplane98.4%

Airplane99.9%

Ship46.0%

Airplane98.3%

Airplane80.8%

Airplane70.0%

MedianFilter(2*2)

Non-localMeans(13-3-4)

OriginalBIM(L∞)JSMA(L0)

AccuracywithSpatialSmoothing

Dataset Squeezer AdversarialExamples(FGSM,BIM,CW∞,DeepFool,CW2,CW0)

LegitimateImages

ImageNet

None 2.78% 69.70%

Median Filter2*2 68.11% 65.40%

Non-localMeans11-3-4 57.11% 65.40%

Baseline

DistributionofDistance(Prediction,SqueezedPrediction)(ImageNet)

020406080100120140

0.0 0.4 0.8 1.2 1.6 2.0

Legitimate

Adversarial

Adversarialinputs(CW2 attack)

Legitimateinputs

OtherPotentialSqueezers

CXie,etal.MitigatingAdversarialEffectsThroughRandomization,ICLR2018.

JBuckman,etal.ThermometerEncoding:OneHotWayToResistAdversarialExamples,ICLR2018.

DMeng andHChen,MagNet:aTwo-ProngedDefenseagainstAdversarialExamples,inCCS2017.

FLiao,etal.DefenseagainstAdversarialAttacksUsingHigh-LevelRepresentationGuidedDenoiser,arXiv 1712.02976.

APrakash,etal.DeflectingAdversarialAttackswithPixelDeflection,arXiv 1801.08926.

• Thermometer Encoding(learnable bit depth reduction)

• Imagedenoising usingbilateralfilter,autoencoder,wavelet,etc.

• Imageresizing

ExperimentalSetup

• DatasetsandModelsMNIST, 7-layer-CNNCIFAR-10, DenseNetImageNet, MobileNet

• Attacks(100examplesforeachattack)• Untargeted:FGSM,BIM,DeepFool• Targeted(Next/Least-Likely): JSMA, Carlini-WagnerL2/L∞/L0

• DetectionDatasets• Abalanceddatasetwithlegitimateexamples.• 50%fortrainingthedetector,theremainingforvalidation.

Threat Models

• Oblivious adversary: Theadversaryhasfullknowledgeofthetargetmodel,butisnotawareofthedetector.

• Adaptive adversary: Theadversaryhasfullknowledgeofthetargetmodel and thedetector.

Detection Framework:MultipleSqueezers

ModelPrediction0

Squeezer1

Prediction1𝐿<

max 𝑑<,𝑑A > 𝑇

Adversarial

Legitimate

Squeezer2

Prediction2𝐿<

• Bit Depth Reduction• Spatial Smoothing

HowtofindTfordetector (MNIST)

SelectathresholdvaluewithFPR5%.

0.0 0.4 0.8 1.2 1.6 2.0

Legitimate

Adversarial

Detect Successful Adv. Examples (MNIST)

SqueezerL∞Attacks L2 Attacks L0 Attacks

FGSM BIM CW∞ CW2 CW0 JSMA

1-bit Depth 100% 97.9% 100% 100% 55.6% 100%

Median 2*2 73.1% 27.7% 100% 94.4% 82.2% 100%

[Best Single] 100% 97.9% 100% 100% 82.2% 100%

Joint 100% 97.9% 100% 100% 91.1% 100%

Bit Depth Reduction is more effective on L∞ and L2 attacks.

Median Smoothing is more effective on L0 attacks.

Joint detection improves performance.

AggregatedDetectionResults

Dataset Squeezers ThresholdFalse

PositiveRate

DetectionRate(SAEs)

ROC-AUCExcludeFAEs

MNIST BitDepth(1-bit),Median(2x2) 0.0029 3.98% 98.2% 99.44%

CIFAR-10BitDepth(5-bit),Median(2x2),Non-localMean(13-3-2)

1.1402 4.93% 84.5% 95.74%

ImageNetBitDepth(5-bit),Median(2x2),Non-localMean(11-3-4)

1.2128 8.33% 85.9% 94.24%

Threat Models

• Oblivious attack: Theadversaryhasfullknowledgeofthetargetmodel,butisnotawareofthedetector.

• Adaptive attack: Theadversaryhasfullknowledgeofthetargetmodel and thedetector.

AdaptiveAdversary

AdaptiveCW2 attack,unboundedadversary.

WarrenHe,JamesWei,Xinyun Chen,NicholasCarlini,DawnSong,AdversarialExampleDefense:EnsemblesofWeakDefensesarenotStrong,USENIXWOOT’17.

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑓 𝑥3 − 𝑡 + 𝜆 ∗ Δ 𝑥, 𝑥3 +𝑘 ∗ 𝑑𝑒𝑡𝑒𝑐𝑡𝑆𝑐𝑜𝑟𝑒(𝑥′)

Misclassificationterm Distanceterm Detectionterm

AdaptiveAdversarialExamples

No successful adversarial examples were found for images originally labeled as 3 or 8.

Mean L2

AdaptiveAdversarySuccessRates

0.060.01

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

ry’s

Clipped ε

Targeted (Next)

Targeted(LL)

Untargeted

Unbounded

Common 𝜺

CounterMeasure:Randomization

• Binaryfilterthreshold:=0.5threshold:=𝒩 0.5,0.0625

• StrengthentheadaptiveadversaryAttack an ensemble of 3detectors with thresholds:=[0.4,0.5,0.6]

0 0.5 10

0 0.5 1

2.80, Untargeted

4.14, Targeted-Next

4.67, Targeted-LL

3.63, Untargeted

5.48, Targeted-Next

5.76, Targeted-LL

AttackDeterministicDetector Mean L2

AttackRandomizedDetector

Conclusion

• FeatureSqueezinghardens deep learning models.• Feature Squeezing gives advantages to the defense side inthe armsrace with adaptive adversary.

Thankyou!ReproduceourresultsusingEvadeML-Zoo:https://evadeML.org/zoo

18Webnar feature squeezing-V2 · Other Potential Squeezers 27 C Xie, et al. Mitigating Adversarial Effects Through Randomization, ICLR 2018. J Buckman, et al. Thermometer Encoding:

Documents

Rivet Squeezers - · PDF fileParts are interchangeable with....

The Buckman Direct Diversion Project

Buckman Direct Diversion Project

TECNOLOGÍAS PARA PAPEL - Buckman

Buckman SUN...Buckman SUN Registration and Course...

Buckman Culture Club 2

ICLR Special Issue

Published as a conference paper at ICLR...

ICLR: Global change and catastrophic loss

LEDUC - ICLR

BUCKMAN, BUCKMAN & REID, INC.Thank you for using FINRA...

ICLR: How flood affects insurance

Buckman culture club

Workshop track - ICLR 2018 - OpenReview

ICLR: Water - the new fire

Buckman Laboratories - Cadet