Top Banner
1 BNAIC 2015 November 5-6, 2015 AMIDST Toolbox A Java library for Analysis of MassIve Data Streams using Probabilistic Graphical Models FP7 European research project Anders L. Madsen, Andres R. Masegosa, Ana M. Martinez, Hanen Borchani, Thomas D. Nielsen, Helge Langseth, Antonio Salmeron, Dario Ramos-Lopez.

Amidst demo (BNAIC 2015)

Jan 15, 2017



AMIDST Toolbox
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Page 1: Amidst demo (BNAIC 2015)

1BNAIC 2015 November 5-6, 2015

AMIDST Toolbox A Java library for Analysis of MassIve Data Streams using

Probabilistic Graphical Models

FP7 European research project

AndersL.Madsen,AndresR.Masegosa, AnaM.Martinez,Hanen Borchani,ThomasD.Nielsen,Helge Langseth,Antonio

Salmeron, DarioRamos-Lopez.

Page 2: Amidst demo (BNAIC 2015)

Outline1. OverviewofAMIDSTToolbox

o Whydatastreamsareimportant?o WhyPGMs foranalyzingdatastreams?o ScalableInference(andlearning)o Roadmapforcomingreleases

2. LiveDemo:Modelingconceptdriftinfinancialdata.o Handlingdatastreams.o DefiningBayesiannetworkswithhidden variables.o InferenceandLearningBayesiannetworks.

BNAIC 2015 November 5-6, 2015

Page 3: Amidst demo (BNAIC 2015)


Page 4: Amidst demo (BNAIC 2015)

Data Streams everywhere

• Unboundedflowsofdataaregenerateddaily:• SocialNetworks• NetworkMonitoring• Financial/Bankingindustry• ….

BNAIC 2015 November 5-6, 2015

Page 5: Amidst demo (BNAIC 2015)

Data Stream Processing

• Processingdatastreamsischallenging:– Donotfitinmainmemory– ContinuousModelupdating– ContinuousModelInference– ConceptDrift

BNAIC 2015 November 5-6, 2015

Page 6: Amidst demo (BNAIC 2015)

Processing Massive Data Streams

• Everythinghastoscale:• ScalableComputinginfrastructure• ScalableModels/Inference/Learning

BNAIC 2015 November 5-6, 2015

Page 7: Amidst demo (BNAIC 2015)

AMIDST Toolbox

• Scalableframeworkfordatastreamprocessing.• BasedonProbabilisticGraphicalModels.• UniqueprojectfordatastreamminingusingPGMs.• Opensourceproject(ApacheSoftwareLicense2.0).

BNAIC 2015 November 5-6, 2015

Page 8: Amidst demo (BNAIC 2015)



§ Thistoolboxaimstodealwithreal,complexandmassivedatastreams.§ Appliedtorealuse-casesofAMIDST’sindustrialpartners.

BNAIC 2015 November 5-6, 2015

Page 9: Amidst demo (BNAIC 2015)

Toolbox Web Page

BNAIC 2015 November 5-6, 2015

Page 10: Amidst demo (BNAIC 2015)



Page 11: Amidst demo (BNAIC 2015)

Why Graphical Models?

§ Let’slookatthefollowingsimpleexample:§ Streamofsensormeasurementsabouttemperature andsmoke presenceinagivengeographicalarea.

§ Monitorthestreamtodetectthepresenceofafire (eventdetectionproblem)

?BNAIC 2015 November 5-6, 2015

Page 12: Amidst demo (BNAIC 2015)

§ Casttheproblemasananomalydetectionproblem(outliers).§ StreamingK-Means(widelyusedinindustry).

Why Graphical Models?


BNAIC 2015 November 5-6, 2015

Page 13: Amidst demo (BNAIC 2015)

Why Graphical Models for analyzing Data Streams?§ Manydatastreamsmodelsareblackboxmodels:

§ Pros:§ Noneedtounderstandtheproblem.

§ Cons:§ Manyhyper-parameterstotune.§ Blackbox modelscanrarelyexplainwhattheylearned.


Blackbox Model


BNAIC 2015 November 5-6, 2015

Page 14: Amidst demo (BNAIC 2015)

§ BayesianNetworks:§ Openboxmodels§ Encodepriorknowledge.§ Continuousanddiscretevariables(CLGnetworks).§ Example:

Why Graphical Models?


Temp Smoke

T1 T2 T3 S1


BNAIC 2015 November 5-6, 2015

Page 15: Amidst demo (BNAIC 2015)

Why Graphical Models?

Stream Predictions

Openbox Models

BNAIC 2015 November 5-6, 2015

Page 16: Amidst demo (BNAIC 2015)

Why Graphical Models?

Stream Predictions

Openbox Models

Blackbox InferenceEngine(multi-coreparallelization)

BNAIC 2015 November 5-6, 2015

Page 17: Amidst demo (BNAIC 2015)


Page 18: Amidst demo (BNAIC 2015)

Inference Engine

§ Queryingthemodel§ p(Fire=true|t1,t2,t3,s1,season)§ E(Temperature|smoke=true).

BNAIC 2015 November 5-6, 2015

Page 19: Amidst demo (BNAIC 2015)

Inference Engine

§ Queryingthemodel§ p(Fire=true|t1,t2,t3,s1,season)§ E(Temperature|smoke=true)

§ Learningfromdata(usingaBayesianapproach):§ Bayesianframeworknaturallydealswithdatastreams.§ Priorisupdatedinthelightofnewdata.

p(✓|d1, . . . , dn, dn+1) / p(dn+1|✓)p(✓|d1, . . . , dn)

BNAIC 2015 November 5-6, 2015

Page 20: Amidst demo (BNAIC 2015)

Querying the model

§ ParallelMonteCarloInference[Salmeron etal.CAEPIA2015]

§ ExploitMulti-Core(poweredbyJava8)

BNAIC 2015 November 5-6, 2015

Page 21: Amidst demo (BNAIC 2015)

Querying the model

§ ParallelMonteCarloInference[Salmeron etal.CAEPIA2015]

§ ExploitMulti-Core(poweredbyJava8)

§ VariationalMessagePassing[Winnetal.JMLR2004]§ Deterministicapproximation

BNAIC 2015 November 5-6, 2015

Page 22: Amidst demo (BNAIC 2015)

Learning from data streams

§ Bayesianapproach:§ Learningasaninferenceproblem.§ PoweredbyVMP.



i = 1 . . . N

BNAIC 2015 November 5-6, 2015

Page 23: Amidst demo (BNAIC 2015)

Learning from data streams

§ Bayesianapproach:§ Learningasaninferenceproblem.§ PoweredbyVMP.§ Plateaunotation!!

BNAIC 2015 November 5-6, 2015

Page 24: Amidst demo (BNAIC 2015)

Learning from data streams

§ ParallelStreamingVariationalBayes [Brodericketal.NIPS2013]

§ PoweredbyVariationalMessagePassing.§ Multi-coreprocessing(usingJava8).

BNAIC 2015 November 5-6, 2015

Page 25: Amidst demo (BNAIC 2015)

Links to other open software

§ MoaLink§ MOAisastate-of-the-arttoolfordatastreammining.§ UsingAMIDSTmodelswithinMOAGUI!

§ Greatforevaluation&comparison.

BNAIC 2015 November 5-6, 2015

Page 26: Amidst demo (BNAIC 2015)

Links to other open software

§ HuginLink§ Hugin isacommercialsoftwareforPGMsandinfluencediagrams.§ Modelsconversion.§ Hugin inferenceenginecanbeusedwithinAMIDST.

26BNAIC 2015 November 5-6, 2015

Page 27: Amidst demo (BNAIC 2015)


Page 28: Amidst demo (BNAIC 2015)

Dynamic Bayesian Networks(release 1.1)

§ Encodetemporalknowledge§ Naturallyfitswithdatastreams


Temp(t) Smoke(t)

T1(t) T2(t) T3(t) S1(t)



BNAIC 2015 November 5-6, 2015

Page 29: Amidst demo (BNAIC 2015)

Distributed Stream Processing(release 1.1)

§ RLink§ InvokeAMIDSTInferenceenginewithinR.§ Preliminaryfunctionalityrecentlypresented.

29BNAIC 2015 November 5-6, 2015

Page 30: Amidst demo (BNAIC 2015)

Distributed Stream Processing(release 2.0)

§ FlinkLink§ ApacheFlink:Opensourceplatformfordistributedstreamprocessing.§ HandlingMassiveDataStreams.

30BNAIC 2015 November 5-6, 2015

Page 31: Amidst demo (BNAIC 2015)

Open Source project

§ We’reopentoyourcontributions!!;)

31BNAIC 2015 November 5-6, 2015

Page 32: Amidst demo (BNAIC 2015)

Hosted on Github

§ Download::>git clone

§ Compile::>./

§ Run::>./ <class-name>

BNAIC 2015 November 5-6, 2015

Page 33: Amidst demo (BNAIC 2015)

Please “star” our project!(if you like it)

33BNAIC 2015 November 5-6, 2015

Page 34: Amidst demo (BNAIC 2015)

Any questions before the live demo ?


Page 35: Amidst demo (BNAIC 2015)



Borchani etal.ModelingConceptDrift:AProbabilisticGraphicalModelBasedApproach.IDA2015.

Page 36: Amidst demo (BNAIC 2015)

Demo Code Available in Github



BNAIC 2015 November 5-6, 2015

Page 37: Amidst demo (BNAIC 2015)

Financial Data

§ ProvidedbyBCC(spanish regionalbank).

§ Consistofmonthlyaggregatedinformation§ Activeclientsbetween18and65yearsold.§ DatabetweenApril2007andMarch2014.§ 11variables

§ Income,totalcredit,expenses,etc.

§ Eachclientisclassifiedas:§ defaulter/non-defaulterinfollowing12months.

37BNAIC 2015 November 5-6, 2015

Page 38: Amidst demo (BNAIC 2015)

Financial Data

§ Hypothesis:§ Doesspanish financialcrisisimpactonbankcustomers?§ Lookattheevolutionofregionalunemploymentrate.

38BNAIC 2015 November 5-6, 2015

Page 39: Amidst demo (BNAIC 2015)

Data Preprocessing/Visualization

§ Visualizetheevolutionofthemonthlyaggregateddata:§ Datadoesnotfitinmainmemory!

39BNAIC 2015 November 5-6, 2015

Page 40: Amidst demo (BNAIC 2015)

Model Building

§ WeuseasimpleNaïveBayesmodel:§ Withaglobalhiddenvariabletotrackconceptdrift.



A1 A2 A11…


BNAIC 2015 November 5-6, 2015

Page 41: Amidst demo (BNAIC 2015)

Model Building

§ WealsousePlateaunotation§ “H”isdesignedtocaptureconceptdrift



A1 A2 A11…



BNAIC 2015 November 5-6, 2015

Page 42: Amidst demo (BNAIC 2015)

Tracking concept drift

42BNAIC 2015 November 5-6, 2015

Page 43: Amidst demo (BNAIC 2015)

Tracking concept drift

43BNAIC 2015 November 5-6, 2015

Page 44: Amidst demo (BNAIC 2015)


§ Masegosaetal.AMIDST:AnalysisofMassiveDataStreamsusingProbabilisticGraphicalModels.Submitted toJMLR.2015.

§ Borchani etal.ModelingConceptDrift:AProbabilisticGraphicalModelBasedApproach.IDA2015.

§ Masegosaetal.Probabilisticgraphicalmodelsonmulti-coreCPUsusingJava8.Submitted toIEEEComputational IntelligenceMagazine,SpecialIssueonComputational IntelligenceSoftware.2015.

§ Salmeron etal.ParallelimportancesamplinginconditionallinearGaussiannetworks.InProceedingsof theConferencia delaAsociacion Españolapara laInteligencia Artificial, volumeinpress,2015.

§ Winnetal. Variationalmessagepassing.JournalofMachineLearningResearch,6:661–694,2005.

§ Brodericketal.Streamingvariational Bayes.InAdvancesinNeuralInformationProcessingSystems,pages1727–1735,2013.

44BNAIC 2015 November 5-6, 2015

Page 45: Amidst demo (BNAIC 2015)

Any questions ?

45 2015 November 5-6, 2015

Page 46: Amidst demo (BNAIC 2015)

Open Source project

§ We’reopentoyourcontributions!!;)

46BNAIC 2015 November 5-6, 2015