Top Banner
Ein Unternehmen der Daimler AG Lecture @DHBW: Data Warehouse 01 Introduction & Motivation Andreas Buckenhofer
70

Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 [email protected] / Internet:

May 22, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Ein Unternehmen der Daimler AG

Lecture @DHBW: Data Warehouse

01 Introduction & Motivation

Andreas Buckenhofer

Page 2: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS GmbH

Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99

[email protected] / Internet: www.daimler-tss.com

Sitz und Registergericht: Ulm / HRB-Nr.: 3844 / Geschäftsführung: Martin Haselbach (Vorsitzender), Steffen Bäuerle

© Daimler TSS I Template Revision

Andreas BuckenhoferSenior DB Professional

Since 2009 at Daimler TSS

Department: Machine Learning Solutions

Business Unit: AnalyticsDHBWDOAG

xing

Contact/Connect

vcard

• Oracle ACE Associate

• DOAG responsible for InMemory DB

• Lecturer at DHBW

• Certified Data Vault Practitioner 2.0

• Certified Oracle Professional

• Certified IBM Big Data Architect

• Over 20 years experience with

database technologies

• Over 20 years experience with Data

Warehousing

• International project experience

Page 3: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 3

Change Log

Date Changes

02.10.2019 Initial version

Page 4: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 4

What you will learn today

• Data Warehousing is a major topic of computer science

• After the end of this lecture you will be able to

• Understand current challenges towards a data-driven future

• Understand the basic business and technology drivers for data warehousing and Big Data

• Describe the characteristics of a data warehouse

• Describe the differences between production and data warehouse systems

Page 5: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

We’re entering a new world in which

data may be more important than

software.

[Tim O’Reilly, Founder O’Reilly Media]

Data is a precious thing and will last longer than

the systems themselves.

[Tim Berners-Lee, Father of the Worldwide Web]

Information is the oil of the 21st

century

[Peter Sondergaard, Gartner]

Everything we do in the digital realm ... creates a data trail.

And if that trail exists, chances are someone is using it.

[Douglas Rushkoff, Author]

Page 6: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Data creation is exploding

[Gavin Belson, HBOs Silicon Valley]

Data is the new gold

[Open Data Initiative, European Commission]

In a world deluged by irrelevant

information, clarity is power.

[Yuval Noah Harari, Author]

Big data is not about the data

[Gary King, Harvard University]

Page 7: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Data WarehousE

Data Warehouse /

DHBWDaimler TSS 7

Applications come, applications go.

The data, however, lives forever.

It is not about building applications;

it really is about the data underneath these applications

(Tom Kyte, Oracle)

Page 8: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 8

What do you think is the biggest challenge in data?

Technology?People and their

Know-How?Privacy & Ethics?

Data quality?Data sharing

culture?

Transparency, Trust

and security?

Page 9: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS 9

Source: https://informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/

Page 10: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Source: https://informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/

Page 11: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Introduction

Data Warehouse / DHBWDaimler TSS 11

Page 12: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 12

Data in, intelligence out?Data producers vs data consumers

Internet of Everything

Industry 4.0

Artificial Intelligence

Connected Cars

Data Cybersecurity

Smart Cities

Digital twins

Data Ethics and

Data Privacy

Robotics

Digitization

Social Media

Alexa, Cortana, Siri

Online Transaction

Processing (OLTP)

Audio/Video

Streaming

Page 13: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 13

Data producers

Source: Barry Devlin: Business unIntelligence: Insight and Innovation beyond Analytics and Big Data, Technics Publications 2013, chapter 6.5

Page 14: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 14

Information Technology (1960’ies – 80‘ies)

• Many systems throughout the enterprises for dedicated purposes

• Support daily transactions / day-to-day business

• Target: replace manual and time consuming activities

• Data embedded in process-specific application

• Process-orientation + dedicated purpose

• Customer data, order data, etc. spread over many systems in many variations and with contradictions

Page 15: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 15

Sample applications for an airline

Flight Reservation System

Planes

Airline Frequent Flyer System

Internal Human Ressources System

Inventory Purchasing Systems

Operational PlanningMaintenance Tracking

Billing System

CRM System, e.g. campaigns

Customer data

Customer data

Customer data

Customer data

Planes

PlanesPlanesCrews

Crews

SeatsFood / Drinks

Seats

Seats

Page 16: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 16

Need for decision support system / management information system

Flight Reservation System

Planes

Airline Frequent Flyer System

Internal Human Ressources System

Inventory Purchasing Systems

Operational PlanningMaintenance Tracking

Billing System

CRM System, e.g. campaigns

Customer data

Customer data

Customer data

Customer data

Planes

PlanesPlanesCrews

Crews

SeatsFood / Drinks

Seats

SeatsDCS / MIS

Page 17: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 17

Early Decision Support systems (1960’ies – 80‘ies)

Can be characterized as “Unplanned decision support” or “Unplanned Management Information Systems (MIS)”

• Management needs reports / combined data from different systems to make decisions for company

• Reports are manually written by IT people

• Extract, combine, accumulate data

• Can take several days to write report and to get the data

• Error prone and labour-intensive

• Relevant information may be forgotten or combined in a wrong way

Did not really work

Page 18: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 18

Information technology today - further requirements

• Data still spread across many applications, but additional requirements

• Data as Asset, getting more and more important across all industries

• Not only classical data-intensive companies like Google or Facebook

• Increasing interest e.g. in insurance, health care, automotive, …

• Connected cars, Smart Home, Tailor-made insurances, etc.

• Hype technologies

• New databases technologies like NoSQL and Big Data

• DWH still booming with additional stimuli coming from Big Data, Digitization, Internet Of Things IOT, Industry 4.0, Real Time, Time To Market, etc.

Page 19: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 19

Exercise – data services/products

• What are data services that enrich or replace „hard“ products?

• Where does data improve or influence the customer product experience?

Page 20: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 20

Sample data services

• Driving style: car insurances offering dedicated products, e.g. cheaper for good drivers

• Health-care

• Today: one drug fits all

• Tomorrow: personalized therapy due to patient profiles

• Connected, autonomous cars/vans/etc: no driver required, less accidents

• Airbnb: does not own hotels – acts as broker

• 360-degree view of customer: dedicated offers e.g. on smartphone

Caution: privacy, ethics

Page 21: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 21

OLTP: ONLINE TRANSACTIONAL PROCESSING

Exercise – OLTP systems

Outline at least 5 operational systems for a vehicle manufacturer

• which data is stored by these systems

• characterize which operations are performed by them

• which questions can be answered by these systems (and which questions can not be answered = major problems for decision support)

Page 22: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 22

Sample OLTP systems

Vehicle production

Vehicle

Plant

Worker

Robot

Car rentals

Driver

Booking

Vehicle

Route

Parts Logistics

Part

Plant

Supplier

Route

Financial Services

Credit

Customer

Bank

account

Workshop

Repair data

Parts

Vehicle

Diagnostic

data

Vehicle Sales

Customer

Seller

Vehicle

Production

date

Page 23: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 23

Sample OLTP systems

Truck fleet management

Truck

Route

Driver

Engineering, Research and development

Engineer

Prototype

Vehicle

Tests

Website and Car configurator

Vehicle

CRM Lead

Interior

etc

Page 24: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 24

Challenge

How to get an overall view

across OLTP applications / functions that works?

Page 25: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 25

Major problems for effective decision support

Distributed data

Different data structures

Historic data

System workload

Inadequate technology

Page 26: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 26

Distributed data

Problem: Data resides on

• different systems / storages

• different applications

• different technologies

Solution: Data has to be accumulated on one system for further analysis

• Data is inhomogeneous, e.g. each system has their own customer number or order number, etc.

• How to combine the data?

• Data must be ingested regularly, e.g. daily and not ad-hoc

Page 27: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 27

Different data structures

Problem: Systems developed independently from each other

• Different data types

• E.g.: zip-code as integer or character string

• Different encodings

• E.g.: kilometer or miles

• Different data modeling

• E.g.: last name / first name in different fields vs last name / first name (badly modelled) in one single field

Solution: Dedicated system required that harmonizes / standardizes the data

Page 28: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 28

Issues with historic data

Problem: Data is updated and deleted or archived after max. 3 months

• daily transactions produce lots of data

• limited size of storage → high amounts of data fill up systems

Historic data is required for decision support

• e.g. how did sales figures develop compared to last month / last year / etc.

Solution: All data (changes) have to be stored in a system capable of dealing with huge amounts of data

Page 29: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 29

Issues with system workload

Problem: Performance not optimized for new workloads

• Systems stressed by additional load (due to reports)

• Not optimized for this kind of workload

• Performance of daily transaction business jeopardized

• May possibly lead to system failure!

• Imagine what happens if a system like Amazon gets slow

Solution: Dedicated system that handles complex (arithmetic) queries on huge amounts of data. A system that is optimized for that kind of workloads

Page 30: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 30

Inadequate technology

Problem: Tooling and technology different from OLTP

• Inadequate tools for data integration and analysis

• Infrastructure configured for OLTP transactions and not for DWH load

• Storage systems and processors to weak to fulfill the requirements

Solution: Standard Tools and technology that help to increase productivity and solve such problems, e.g. Reporting Tools for Data Analysis or ETL tools for Data ingestion/load

Page 31: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 31

Challenges are the similar today or even more

• Large amounts of data

• Multiple technologies

• Multiple data formats

• Multiple data schemas

• Rapid changes in data schemas

• Complexity of legacy data

• Data quality challenges

Page 32: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 32

Challenge

How to get an overall view

across OLTP applications / functions that works?

Page 33: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 33

Conclusion

Operative systems not suitable for analytical evaluations

Need for a new, separated system

• fast answers, ad-hoc questions possible

• no interference with daily transaction business

Data Warehouse

Page 34: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 34

Exercise: Data Warehouse user

List possible (functional and non-functional) requirements for a data warehouse end-user. Think of deficiencies of transactional systems like

• Distributed data

• Different data structures

• Problem with historic data

• Problem with system workload

• Inadequate technology

What are requirements from a Data Warehouse user perspective? (List at least 5 requirements)

Page 35: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 35

Data Warehouse User

• Wants to trust the data: quality assured data

• Wants to access and analyze all data in a single database

• Wants to get a complete analysis including history, e.g. where did the customer live 5 years ago or how did bookings develop the last 10 days?

• Wants fast data access for his queries

• Wants to understand the data model = one single and easy data model and not many different applications

• Wants to browse through combined data sets to identify correlations or new insights

Page 36: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 36

Data Warehouse - summary

• Contains data from different systems

• Imports data from different systems on a regular basis

• detailed data and summarized data

• provide historic data

• generate metadata

• OLTP applications remain, DWH is a completely new system

• Overcomes difficulties when using existing transaction systems for those tasks

Page 37: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Definition

Data Warehouse / DHBWDaimler TSS 37

Page 38: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 38

Definitions … not always agreed on a single one

Data Warehouse

Business Intelligence

Big Data

Page 39: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 39

First DWH architecture by Devlin/Murphy (1988)

• "Users can now focus on the use of the information rather than on how to obtain it" (p. 61)

• "Although data may reside in multiple locations, the appearance is of a single source" (p. 63)

• "Each user sees information from different company tables combined in a way that makes the data most meaningful" (p. 67)

• "Business Data Warehouse (BDW): the BDW is the single logical storehouse of all information used to report to the business" (p. 67)

• "For the first time, the end user is given the benefit of the information stored in the Data Dictionary" (p. 75)

Source: http://www.9sight.com/pdfs/EBIS_Devlin_&_Murphy_1988.pdf

http://www.9sight.com/1988/02/art-ibmsj-ebis/

Page 40: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 40

Data Warehouse definitions by two “fathers” of the DWH

Ralph Kimball William Harvey „Bill“ Inmon

„A data warehouse is a copy of

transaction data specifically

structured for querying and

reporting“

“A data warehouse is a subject-

oriented, integrated, time-

variant, nonvolatile collection of

data in support of

management’s

decision-making

process”

Page 41: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 41

Subject-oriented

• A data warehouse is organized around the major subjects (business entities) of the enterprise like

• Customer

• Vendor

• Car

• Transaction or activity

• In contrast to the application/process/functional orientation such as

• Booking application

• Delivery handling

Page 42: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 42

Subject-oriented - example

DWHOLTP

Flight Reservation System

Passengers

Bookings

Flight Operation System

Crews

Planes

Planes

Airline Frequent Flyer System

Customer

Points

Customer Planes

Marketing:

Which are

popular

destinations,

e.g. Paris and

make the

customer an

exclusive offer.

Planning:

How many flight

kilometers and

flight times do

planes have.

When does a

plane need

maintenance?Capacity planning:

What is a forecasted

passenger demand for flights

to London? Is a larger plane

required on the route?

Page 43: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 43

Integrated

Data contained in the warehouse are integrated.

Aspects of integration

• consistent naming conventions

• consistent measurement of variables

• consistent encoding structures

• consistent physical attributes of data (data types)

Page 44: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 44

Integrated - example

OLTP DWH

System1: m,w

System2: male, female

System3: 1, 0

m,w

System1: John Brown

System2: Brown, J.

System3: Brown, JoJohn Brown

System1: Varchar(5)

System2: Number(8)

System3: Char(12)

Varchar(12)

Page 45: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 45

Nonvolatile

• Operations in operational environment

• Insert

• Delete

• Update

• Select

• Operations in a data warehouse

• Insert: the initial and additional loading of data by (batch) processes

• Select: the access of data

• (almost) no updates and deletes (technical updates / deletes only)

Page 46: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 46

Nonvolatile - example

OLTP

Flight Reservation System

Passenger John flies from

Stuttgart to London on 15.02

at 06:00

Insert into DB:

Passenger John, From Stuttgart to London,

15.02. 06:00

Passenger John changes his

mind and flies at 10:00

Update in DB:

Passenger John, 15.02. 10:00

DWH

Insert into DB:

Passenger John, From Stuttgart to London,

15.02. 06:00

Insert into DB:

Passenger John, From Stuttgart to London,

15.02. 10:00

Page 47: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 47

Nonvolatile - example

• What happens in the OLTP system if the customer cancels his booking?

• Delete operation in OLTP

• Seat gets available again and can be sold to another passenger

• What happens in the DWH?

• Insert operation in DWH with e.g. a flag indicating that the customer

cancelled/deleted his booking

• Business can make analysis about cancelled booking: why might the customer

have cancelled? How to prevent the customer or other customers to cancel

next time?

Page 48: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 48

Time-variant

• All data in the data warehouse is accurate as of some moment in time

• Has to be associated with a time stamp

• Once data is correctly recorded in the data warehouse, it cannot be updated or deleted

• Data warehouse data is, for all practical purposes, a long series of snapshots

• In the operational environment data is accurate as of the moment of access

• Operational data, being accurate as of the moment of access, can be updated as the need arises

Page 49: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 49

Time-variant - example

DWH

Insert into DB:

Passenger John, From Stuttgart to London,

15.02. 06:00

Insert into DB:

Passenger John, From Stuttgart to London,

15.02. 10:00

Insert into DB:

Passenger Jim, From Hamburg to Munich,

18.02. 15:00

DB insert timestamp: 02.02. 15:03:21

DB insert timestamp: 02.02. 15:04:29

DB insert timestamp: 05.02. 12:15:03

Insert into DB:

Passenger Mike, From Hamburg to Munich,

15.02. 10:00

DB insert timestamp: 05.02. 12:15:11

Insert into DB:

Passenger John, From Stuttgart to London,

15.02. 10:00, Cancel Flag

DB insert timestamp: 08.02. 09:52:33

Page 50: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 50

Data warehouse Definition update by InmonWWDVC conference 2018

Source: https://twitter.com/lecyberax/status/996723448092266497

Integrated got a

different

meaning (storing

raw data due to

various reasons)

Page 51: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 51

Exercise - DWH

You outlined OLTP systems for a vehicle manufacturer in an earlier exercise.

Now start designing a Data Warehouse:

• Describe what data can be stored in it. Define at least 5 subject-areas!

• Which questions can/should be answered with this information

Page 52: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 52

DWH – Subject areas

Customer

Driver

Bank

account

CRM Lead

Individual or

company?

Part

Supplier

Color

Partnumber

Description

Vehicle

Truck

Prototype

Car

Car Rental

GPS data

Rental start

time

Bill

Rental end

time

Formula-1

car

Plant

Robots

Cars built

Location

Page 53: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 53

Exercise – sample questions

Which customers own a car and use car rental regularly?

Which parts have the most defects? Can diagnostic data be used to predict potential defects and warn customers?

Which areas and times are popular for car rentals? Does it make sense to relocate cars to these areas? (e.g. cinema in the evening/night)

Page 54: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 54

Data Warehouse (DWH) or Business Intelligence (BI)?

• DWH or BI: Often used as synonym

• DWH more technical focus

• central repository containing data from many sources: subject-oriented, integrated, nonvolatile, time-variant

• BI more business / process oriented with a broader focus

• “Business intelligence is a set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision making.” (Boris Evelson, Forrester Research, 2008)

This lecture has a broader focus – not just DWH as a central repository

Page 55: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 55

OLTP vs OLAP – oversimplified view

DB

DB

OLTP

Application(could be

Microservice)

OLTP

Application(could be

Microservice)

OLTP

Application(could be

Microservice)

Decision

Management

Decision

Management

DB

DB

Page 56: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 56

OLTP vs OLAPOperational system vs DWH

Online Transaction Processing Online Analytical Processing

Transaction-oriented system Query-oriented system

Optimized for insert and update consistency Optimized for complex queries with short

response times; ad-hoc queries

Many users change data Only ETL process writes data

Selective queries on the data Evaluations of all data including history

(complex queries)

Avoid redundancy Redundant data storage

Normalized data management 3NF De-normalized data management

Relational Data Modeling Several layers with different data models, one

model usually Dimensional Data Modeling

Page 57: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 57

Operative vs Integrated data

Operative data Integrated data

Handling Structured, parallel processes with

short and isolated ("atomic")

transactions

Information for management (decision

support)

Modeling Process- and function oriented,

individual for each application

Different data models in one DWH;

historic, stable and summarized, data

# users Many Few(er) but increasing user base

System return time Milliseconds Seconds to minutes (even hours)

Page 58: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 58

Operative vs analytical databases

Operative DBs Analytical DBs

Purpose Processing of daily business

transactions

Information for management (decision

support)

Content Detailed, complete, most recent

data

Historic, stable and summarized data

Data amount Small amount of data per

transaction. Nested Loop Joins

Large amount of data for load, and

often per query. Hash Joins common

Data structure Suitable for operational

transactions

Several models; suitable for long term

storage and business analyses

Transactions ACID; very short read/write

transactions

Long load operations, longer read

transactions

Page 59: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 59

What happens in an internet minute?

Source: https://www.allaccess.com/merge/archive/28030/2018-update-what-happens-in-an-internet-minute#sthash.IKyiTou1.uxfs

Page 60: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 60

Big Data characteristics

Volume

• The amount of data

Velocity

• The speed at which data is generated

Variety

• The different types of data

Veracity

• The trustworthiness/ accuracy of data

Page 61: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 61

Volume 1(2)

What is a high amount of data?

• Walmart handles more than 1 million customer transactions every hour, which are imported into databases estimated to contain more than 2.5 petabytes (2560 terabytes) of data — the equivalent of 167 times the information contained in all the books in the US Library of Congress

• Internet: Google processed about 24 petabytes of data per day in 2009

Page 62: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 62

Volume 2(2)

What is a high amount of data?

• Telecommunications (usage): AT&T transfers about 30 petabytes of data through its networks each day.

• As of January 2013, Facebook users had uploaded over 240 billion photos, with 350 million new photos every day. For each uploaded photo, Facebook generates and stores four images of different sizes, which translated to a total of 960 billion images and an estimated 357 petabytes of storage

Page 63: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 63

Small data / smart dataData vs information

1 Kilobyte kB = 1.000 Byte

1 Megabyte MB = 1.000.000 Bytes = 10^6 Bytes

1 Gigabyte GB = 1.000.000.000 Bytes = 10^9 Bytes

1 Terabyte TB = 10^12 Bytes

1 Petabyte PB = 10^15 Bytes

1 Exabyte EB = 10^18 Bytes

1 Zettabyte ZB = 10^21 Bytes

1 Yottabyte ZB = 10^24 Bytes

Source: https://www.cmswire.com/cms/information-management/big-data-smart-data-and-the-fallacy-that-lies-between-017956.php#null

Page 64: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 64

Velocity

What is high velocity?

• The Large Hadron Collider experiments represent about 150 million sensors delivering data 40 million times per second. There are nearly 600 million collisions per second. After filtering and refraining from recording more than 99.99995% of these streams, there are 100 collisions of interest per second

• Internet of Things

• Connected, autonomous Cars

Page 65: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 65

Variety

• Structured data like tables typically stored in relational databases

• Unstructured data usually generated by humans e.g. natural language, voice, Wikipedia, Twitter posts, video, images

• Semi-structured data has some structure in tags but it changes with documents E.g. HTML, XML, JSON files, server logs

Unstructured data is a bad phrase, e.g. Tweets are structured, too.

Better: data has low information density.

Page 66: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 66

Veracity

• Data involves some uncertainty and ambiguities

• Mistakes can be introduced by humans and machines

• #FakeNews

Data Quality is vital!

Garbage In – Garbage Out

Garbage data + perfect model => garbage results

Page 67: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 67

Big Data definition 1(2)

• Still no agreed definition

• Originally and most used:

• Volume +

• Velocity +

• Variety

• Big data is a term used to refer systems that are too complex for traditional data-processing (often said that an RDBMS does not suffice anymore). Big data challenges include capturing data, data storage, data analysis, search, etc.

• part of this lecture

Page 68: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 68

Big Data definition 2(2)

• Another usage of the term "big data" refers to advanced data analytics or data science methods that extract value from data

• Not part of this lecture

Page 69: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS Data Warehouse / DHBW 69

Big Data landscape

Source: http://mattturck.com/wp-content/uploads/2018/07/Matt_Turck_FirstMark_Big_Data_Landscape_2018_Final.png

Page 70: Lecture @DHBW: Data Warehousebuckenhofer/20192DWH/... · Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet:

Daimler TSS GmbH

Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99

[email protected] / Internet: www.daimler-tss.com

Sitz und Registergericht: Ulm / HRB-Nr.: 3844 / Geschäftsführung: Martin Haselbach (Vorsitzender), Steffen Bäuerle

© Daimler TSS I Template Revision