Top Banner
Talend Data Integration and Management
28

Talend Open Studio Data Integration

Nov 30, 2014

Download

Technology

Talend Open Studio ETL tool, Talend Profiler and Data Management. Tot
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Talend Open Studio Data Integration

Talend Data Integration and Management

Page 2: Talend Open Studio Data Integration

www.robertomarchetto.com

Data Integration

Data Integration involves combining data residing in differente sources and providing the

user with a unified view of the data

Data Management combines different disciplines to manage data as a valuable resource

Page 3: Talend Open Studio Data Integration

www.robertomarchetto.com

Talend

● Talend is a company focused on Data Integration and Data Management solutions

● Talend is a „Cool Vendor“ for Gartner (2010)● Present in more than 12 locations around the

World● Fast growing company

Page 4: Talend Open Studio Data Integration

www.robertomarchetto.com

Talend Open Studio

Page 5: Talend Open Studio Data Integration

www.robertomarchetto.com

Talend Open Studio

● Open Source, professional tool● Draw procedures linking components, each

component performs an operation● DB vendor-specific optimized components● Produces fully editable Java (or Perl) code● Deployment with small and fast compiled Java

or as Web Service● Eclipse based IDE, excellent flexibility● BI Platform indipendent, DB Vendor indipendent

Page 6: Talend Open Studio Data Integration

www.robertomarchetto.com

Automatic code generation, diffent deployment

Page 7: Talend Open Studio Data Integration

www.robertomarchetto.com

Extracion Transformation Loading

● ETL is a common process in Data Integration● Extract, reading data from different datasources

(database, flat files, spreadsheet files, web services, etc)

● Transfom, converting data in a form so that it can be placed in another container (database, web services, files, etc). Cleaning, computations and verifications are also performed

● Load, write the data in the target format

Page 8: Talend Open Studio Data Integration

www.robertomarchetto.com

Tutorial, Source data

Page 9: Talend Open Studio Data Integration

www.robertomarchetto.com

Tutorial, Destination data (Datawarehouse)

Page 10: Talend Open Studio Data Integration

www.robertomarchetto.com

Tutorial, Metadata

● Talend requires a preliminary definition of the metadata

● Often a strong metadata definition means, as in programming languages, fast, robust and maintenable applications

● ..demo..

Page 11: Talend Open Studio Data Integration

www.robertomarchetto.com

Tutorial, Talend jobs basics

● Place components on the designer● Link components to build a transformation● Main type of link: Rows flow● Schema metadata is propagated and must be

coherent● ..demo..

Page 12: Talend Open Studio Data Integration

www.robertomarchetto.com

Tutorial, users_dimension

Page 13: Talend Open Studio Data Integration

www.robertomarchetto.com

Test the job

Page 14: Talend Open Studio Data Integration

www.robertomarchetto.com

Tutorial, accounts_dimension

Page 15: Talend Open Studio Data Integration

www.robertomarchetto.com

Tutorial, dates_dimension

Page 16: Talend Open Studio Data Integration

www.robertomarchetto.com

Tutorial, write a Java library

Page 17: Talend Open Studio Data Integration

www.robertomarchetto.com

Tutorial, opportunities_fact

Page 18: Talend Open Studio Data Integration

www.robertomarchetto.com

Tutorial, define a root job

Page 19: Talend Open Studio Data Integration

www.robertomarchetto.com

Deploy and run

Page 20: Talend Open Studio Data Integration

www.robertomarchetto.com

Extensibility, comunity plugins

● Many official components

● Components for every task released by the comunity

● Geospatial components, log analysis, Google analytics, data encryption, etc

Page 21: Talend Open Studio Data Integration

www.robertomarchetto.com

Scheduler

Page 22: Talend Open Studio Data Integration

www.robertomarchetto.com

And now.. reports, dashboards, OLAP, Geoanalysis, KPIs..

Page 23: Talend Open Studio Data Integration

www.robertomarchetto.com

Do you trust your data?

Page 24: Talend Open Studio Data Integration

www.robertomarchetto.com

What about data quality?

● Customer A is present 5 times with different names

● Null values can vary statistical indexes like mean calculation

● Duplicated records● Blank values● Some records can contain errors (es -1 field

values)● Some records can be garbage

Page 25: Talend Open Studio Data Integration

www.robertomarchetto.com

Talend Open Profiler

Page 26: Talend Open Studio Data Integration

www.robertomarchetto.com

What abount data storage size?

● Some fields can be oversized for the data they contain

● Sometimes fields are related and can be calculated

● Some keys or values are never used● When data grow garbage grow● Data storage is not free (disks, electricity,

backups, DB licenses)

Page 27: Talend Open Studio Data Integration

www.robertomarchetto.com

Data is „the black gold“ that can produce knowledge

● Data is a resource, you can extract knowledge● A lot of Data produces concise informations● Data storage is not free and a lot of data can

make system not fast● Data cleansing is a central process in statistical

analysis and Data Mining

Page 28: Talend Open Studio Data Integration

www.robertomarchetto.com

Talend Master Data Management