Top Banner
ALTIC Big Data Stack Charly Clairmont, ALTIC @egwada [email protected] http://www.altic.org
22

Altic's big analytics stack, Charly Clairmont, Altic.

Nov 29, 2014

Download

Technology

OW2 Consortium

For a long time Altic has been an active member of the OW2 BI Initiative. Since a few years, Altic has taken a deep interest in the Big Data technologies, like many others actors of the OW2 consortium. Some of them even have added new features related to Big Data in their offers. Altic and its partners, Talend and Engeeniring Informatica (SpagoBI), have decided to create a Big Data Stack using their own solutions. The magic thing : Altic hasn't changed the way its projects are done but only learnt how to store Big Data and compute them. In this presentation we will propose to discover our Big Data stack with : * Hadoop and Spark to store and compute data * Talend DI to create Big Data tasks published and scheduled in SpagoBI * SpagoBI to manage security and allow end users to access to data visualization. For a more user friendly Big Data stack we provide new components : * tMahout talend component which helps us to create Datamining job inside Hadoop without code development in MapReduce paradingm, * SpagoBID3Engine which provides easy manipulation of data to develop beautiful data visualizations
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Altic's big analytics stack, Charly Clairmont, Altic.

ALTIC Big Data Stack

Charly Clairmont, ALTIC

@[email protected]

http://www.altic.org

Page 2: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

smart #OpenSource Software #BusinessIntelligence

assembler

Page 3: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

Our historical tools

• ETL : Talend

• Reporting : JasperReports, Birt

• OLAP : Mondrian, Palo

• BI platform : SpagoBI

Page 4: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

Smart assembling Innovation & customers'needs

● Identify when applied research is an opportunity for us, our solutions and our customers.

● Understand the business process of our customer & assess the impact of Open IT on their activities

● Offer an approach of the project both a technical and a operative

➔ Altic projects

➔ Allows our customer to optimize their business process

➔ Takes the customer job into account

➔ Offers perennial solutions

➔ Follows the customer present needs and not the editors' agenda

Page 5: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

Identify Big Data potential / Hadoop

Page 6: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

Our first Big Data project at Altic

● eFraudBox project (2010 – 2013)● Goal : predict frauds on Internet● Context :

– Customer : GIE carte bancaire– European Research and Development project– Lot of industrial and academic partners

● Data :– Type : Banking transactions– Volume : One GB per day

Page 7: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

How did we start our first BigData project ?

Page 8: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

« In data mining processing is done line by line » … [ there's not about a data volume issue ]

Page 9: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

But we have too much data !

Page 10: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

● Open Source

● MPP compute platform

● Distributed file system

● MapReduce processing

● Cost efficient

● Fault tolerant

● Infinite scale

● Enterprise Information System ready

● Continuous Improvement

● Growing community

Let's have a look at Hadoop ?

« Even transactions are possible on Hadoop - it's inevitable that ALL kinds of workloads will move there

in the future »

Doug CUTTINGHadoop Creator

Octobre 2013

Page 11: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

How do we query Hadoop ?

● SQL like● Easy development

● Pig Latin● Easy syntax● Support unstructured data

● Java● Very optimised● Very customisable

Page 12: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

How do we query Hadoop ?

● We already know SQL !

● Why not ?● Need to code evertything

Page 13: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

Ok, we have our storage and computation engine, but how can we

manage data ?

By using our Swiss Army Knife !

Page 14: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

Now our Hadoop / Hive platform is filled with Big Data,

but It's a little bit too slow to query for end users...

http://ih2.redbubble.net/image.13088996.5766/sticker,375x360.png

Page 15: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

Processing data with Hive and store results in fast databases

Aggregate data

Page 16: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

Ok, now we have our fast queryable datasets, but how can we visualize these ?

To manage users and visualizations

To quickly have a vision of your data

To go deeper in your visualizations

Page 17: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

BigData and Datamining : tMahout

+= tMahout

+

Page 18: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

BigData and Datamining v2

● Spark : new InMemory data processing framework

● Very appropriate for Machine learning● MLBase : Machine learning library● Spark-clustering : Implementation of SOM algorithm● Proof Of Concept : Analysis of mobile

telecommunications

Page 19: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

We have now a Big Data stack !

Page 20: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

BI & Big Data for Altic

● Eventually, we still do BI as usual● Tools evolve :

– New storage and processing– We do not change our tools, fortunately THEY progress

for us and we contribute● Fundamental does not really change, only

technologies do– Hadoop– Spark

Page 21: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

We improve our Big Data stack and its approach...

And support Big Analytic customer project

Our Big Data Stack Our Big Data Approach

Page 22: Altic's big analytics stack, Charly Clairmont, Altic.

Twitter #ow2con @egwadawww.ow2.org

Questions ?

Thanks !

Charly CLAIRMONTCTO at ALTIC

@[email protected]

http://altic.org