Top Banner
IS DATA PREPARATION THE NEXT BIG DATA DISRUPTION ? The 22nd International Conference on Distributed Multimedia Systems DMS 2016 Grand Hotel Salerno, Salerno, Italy November 25 - 26, 2016
25

Implementing Data Preparation in Distributed Multimedia System

Feb 15, 2017

Download

Technology

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Implementing Data Preparation in Distributed Multimedia System

IS DATAPREPARATIONTHENEXT

BIGDATADISRUPTION?

The22ndInternationalConferenceonDistributedMultimediaSystemsDMS2016

GrandHotelSalerno,Salerno,ItalyNovember25- 26,2016

Page 2: Implementing Data Preparation in Distributed Multimedia System

• SCENARIO

• BIGDATAINTHEDATADRIVENENTERPRISE

• WHATDATAPREPARATIONSHOULDCOVER

• CREATINGREADYDATAUSINGFRACTALS

• CASESTUDY

Agenda

SourceForrester2016

Page 3: Implementing Data Preparation in Distributed Multimedia System

1. DOESTHEBUSINESSANALYSTUNDERSTANDTHEDATASCIENTIST?2. WHYDATADRIVENCOMPANIESAREHIRINGDATAJOURNALISTS?3. WHYDARKDATAEXTERNALTODATALAKESCONTINUETOGROW?4. WHYITISREQUIREDSOLONGTIMEFORMAKINGDATA?5. DATAPLAYANDNARRATIVES?

HOW LONG TIME AVAILABLE TO EXPLOIT DATA PROCESSING OUTPUT?

77%DataProcessing

23%DataAnalysis

SourceBloor2016

Page 4: Implementing Data Preparation in Distributed Multimedia System

90%ISDARK

12%AVAILABLEFORBUSINESSINSIGHTS

88%ISJUSTSTORED

80%RECORDINGs,PDFs ANDTEXTs

sourceIDC2016

+4300%ANNUALDATAGENERATION

Page 5: Implementing Data Preparation in Distributed Multimedia System

Datapreparationisaniterativeprocessforexploringandtransformingrawdataintoformssuitablefordatascience,datadiscovery,andanalytics.Self-servicedatapreparationtools(SSDP)areuser-orientedtoolsthatenabledatapreparationcapabilitiessuchasdatacataloging- inventorying,datadiscovery,dataexploration,datatransformation,datastructuring,surfacingofsensitiveattributesandanomalydetection.Thesetoolsareaimedatreducingthetimeandcomplexityofpreparingdataandimprovinganalystproductivity.

Preprocess

Prepare

Discover

Exploit

Raw Technicallycorrect

ReadyData

Patterns

Formatted

Multimediadomain

MissingMultimedia

Page 6: Implementing Data Preparation in Distributed Multimedia System

Dependingonhowyoucountthem,thereareanywherefrom20to50providersofself-servicedatapreparationtools.However,they’renotallequal,andusersshouldcarefullyexaminetheofferingtomeasurethey’regettingwhattheyexpect.ManyBIandAdvancedAnalyticsvendors(Tableau,Qlik,Sas etc.)havejumpedontoSSDP,eveniftheircapabilitiesaren’tseparatefromtheircoreofferingsandshowslimitationsintermofPerformances,Neutrality,Customprocessing.Thekeyreasonwhyself-servicedataprepwillsurviveasitsowncategoryentityisthegrowingrealizationthatdatapreparationneedstobekeptseparatefromanalysisandDiscovery.Thevolumesandthenumberofdatasourceswillnotbedecreasing,andneitherwillthenumberofBItools.Tothatend,it’slikelythatself-servicedataprepwillremainaproductcategoryuntoitselffortheforeseeablefuture.

SourceBloor2016

Page 7: Implementing Data Preparation in Distributed Multimedia System

WhereweareBIGDATAINTHEDATADRIVENENTERPRISE

Page 8: Implementing Data Preparation in Distributed Multimedia System

WE ALL AREAWARE

I.T.DIVISIONIS GOING TOBUILD

PLANETS OFDATA

Page 9: Implementing Data Preparation in Distributed Multimedia System

WHICHAREWORLDS MADEOFDATABASEs,DATALAKEs,DATAWAREHOUSEs,

STRUCTUREs,ANDSCHEMAs

Page 10: Implementing Data Preparation in Distributed Multimedia System

IT SEEMS THATTHESE WORLDS ARECALLED

“BIGDATA”

Page 11: Implementing Data Preparation in Distributed Multimedia System

BUT,WE’RE AFRAID TOCREATETHEM,LORDSARETAKING LONGER THAN 7DAYS

AND,UNFORTUNATELY,WORSE…IT SEEMS THAT

HUMANSHAVEN’TACCESSTOTHOSE

WORLDS

Page 12: Implementing Data Preparation in Distributed Multimedia System

Bottomline:

Isthedatapreparationthebridgebetweenplanetsofdataandtheuser?

BigData isnotJusttechnology,responsibilityshouldbeallocatedonthebasisofthefollowingcriticalfactors:

1. Rawdatawill betransfered tothepreparationunit(push),or

2. thepreparationunit has toread datafromthedatalake (pull)?

3. thedatalake has been designed tostageortostorerawdata?

4. what about thevariability ofthecontext anddata?

PULL

ITDatalakepurpo

se

PUSH

STOR

ESTAG

E

DataCommunication mode

ENDUSER

IT

ENDUSER ENDUSERLowvariability

Highvariability

Page 13: Implementing Data Preparation in Distributed Multimedia System

BackgroundsWHATDATAPREPARATIONSHOULDCOVER

Page 14: Implementing Data Preparation in Distributed Multimedia System

rawdatarcold,analyticshot

Page 15: Implementing Data Preparation in Distributed Multimedia System

reality

1993understandingcomics

HowtoConnectanalyticsand

details?

Page 16: Implementing Data Preparation in Distributed Multimedia System

Adatabaseisrequiredtocontextualizelanguagesand

realities

Page 17: Implementing Data Preparation in Distributed Multimedia System

Bottom Line:Usage of data should be faster, cost less with minimum data

movement requirements

• materializerealityandlanguageinaconsistentdatabase

• couplelanguageandrealityusingkeyback features

• BindexternalalgorithmusingOpen(Standard?)UserExits

• fosterholisticviewsofdatathroughGridDataUnification

Page 18: Implementing Data Preparation in Distributed Multimedia System

blendingContext,languagesandfacts

CREATINGREADYDATAUSINGFRACTALADC

Page 19: Implementing Data Preparation in Distributed Multimedia System

rowId Nname Ncity

1 1 1

2 2 2

3 3 3

4 2 2

Key Value NValue

Name Aldo 1

Name Sara 2

Name Anna 3

City Miami 1

… … …

DateBirth UDateB Age

11/1/90 1/11/90 26

12/2/89 2/12/89 26

1.1.68 1/1/68 48

31-1-61 1/31/61 56

Ncity city state

1 Miami Fl

2 NYC NY

3 Rome Italy

Map DictionaryLuggage

hierarchyDatacomplex Storagegroup

name city DateBirth

Aldo Miami 11/1/90

Sara NYC 12/2/89

Anna Rome 1.1.68

Sara NYC 31-1-61

Datasource

Fractalconversion

TransformDateBirth

Add Geoclassification

ADCisafractallikealgorithmthatconvertsaninputrawdataandrelateddataprocessingintoasetofchainedbinaryblocks,formulasandlongpointers.

WeshowthatADCrepresentsanimportantsetofcomputations…TheadvantagesofADCarethat:

itisdescribedbyasmallnumberofparametersandhasaprioriknownsizesoftheviews,theviewscanbegeneratedindependently,theoverheadofcombiningthegeneratedviewsispredictable,thedatasetcanbepartitionedintoanumberofindependentlygeneratedsubsets,theelementsofthedatasetarepseudorandom

ThesepropertiesmakeADCastrongcandidateforadataintensivegridbenchmark<M.Frumkin NASANASDivision>

Page 20: Implementing Data Preparation in Distributed Multimedia System

Using the fractal engine, performances are extreme

Page 21: Implementing Data Preparation in Distributed Multimedia System

Usecase

Page 22: Implementing Data Preparation in Distributed Multimedia System

MATERIALTESTING

• ComplexJson,Oracle,csv,wmv data

• ManualdataprocessingexecutedusingMathlab

• HoursofScientistworktodetectoutlier

• Impossibilitytoreplicatetestswithsameresults

• Scarceknowhowcapitalization

• BlendofdatahappensatNarrativewritingtime

Page 23: Implementing Data Preparation in Distributed Multimedia System

TerabytelevelstagingRigidbatchprocessing

Nohistory

Digitalreality Language

FractalDatabase

Page 24: Implementing Data Preparation in Distributed Multimedia System

BottomLine:Everydaywehearfromentrepreneursdoingtheirbesttoturntheirbigideasinaconsistentandsuccessfulonlinebusiness.HereITistheenablerbut,unfortunately,sometimestheTparthasanegativeinfluenceonthedevelopmentofthecoreidea.

TheidealtoolkitismadeforwhowishtoexploittheIpartoftheIT,sothatentrepreneurshavinggreatideas,cancrafttheirbusinessthemselves.Andtheyshould!

Page 25: Implementing Data Preparation in Distributed Multimedia System

©2016datonixSpa

Thank you