Top Banner
Enterprise data warehouse optimization Explore the key building blocks to reduce costs and performance strain
13

The six new competencies Industrial companies need on their path ...

Mar 25, 2023

Download

Documents

Khang Minh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The six new competencies Industrial companies need on their path ...

Enterprise data warehouse optimizationExplore the key building blocks to reduce costs and performance strain

Page 2: The six new competencies Industrial companies need on their path ...

2

The enterprise data warehouse (EDW) has been at the cornerstone of enterprise data strategies for over 20 years. EDW systems have traditionally been built on relatively costly hardware infrastructures. But ever-growing data volume and increasingly complex processing have raised the cost of EDW software and hardware licenses while impacting the performance needed for analytic insights. Organizations can now use EDW offloading and optimization techniques to reduce costs of storing, processing and analyzing large volumes of data.

Getting data governance right is critical to your business success. That means ensuring your data is clean, of excellent quality, and of verifiable lineage. Such governance principles can be applied in Hadoop-like environments. Hadoop is designed to store, process and analyze large volumes of data at significantly lower cost than a data warehouse. But to get the return on investment, you must infuse data governance processes as part of offloading.

Introduction

+ =

Data quality Governance ROI

Page 3: The six new competencies Industrial companies need on their path ...

3

A modular solution

IBM® relies on eight key requirements to maximize your return in EDW projects. When in place, these capabilities help maximize the return from enterprise data warehouse offloading projects. Some of these requirements focus on moving and transforming data while others involve a people-process architecture.

Required capability Why is this important IBM solution

1. Move data Low-cost, efficient movement of data IBM InfoSphere® DataStage® / IBM BigIntegrate®

2. Transform and integrate Reduce costs while using existing assets InfoSphere DataStage / BigIntegrate

3. Improve data quality Poor data quality means garbage-in, garbage-out IBM InfoSphere QualityStage® / IBM BigQuality®IBM InfoSphere Information Analyzer

4. Govern your data Ungoverned Hadoop means unmanageable Hadoop IBM Information Governance Catalog

5. Replicate Deliver data where and when needed IBM InfoSphere Data Replication

6. Augment and enrich Increase ROI from EDW analytics InfoSphere DataStage / BigIntegrate

7. Reference architecture Reduce project costs and risks IBM Enterprise Analytics Reference Architecture

8. Implementation patterns Reduce project costs and risks IBM Analytics Implementation Patterns

{

{

ETL

ETL

Page 4: The six new competencies Industrial companies need on their path ...

4

Extracting, moving and ingesting large amounts of data from the data warehouse to Hadoop requires a shared-nothing, massively parallel platform, with no limitation on throughput and performance. IBM provides a fully scalable data integration platform that supports extraction, movement and ingestion with an easy-to-use drag and drop interface. You can insert different levels of parallelism in different phases of the process, depending on your own requirements.

Move data

EDW Data File_Connector Hadoop

Extract EDW data X-way parallel Move data Y-way parallel with data repartitioning Load Hadoop Z-way parallel

15 tb/hr IBM HDFS loading test

30 tb/hr Just double the hardware

Page 5: The six new competencies Industrial companies need on their path ...

5

Often you must transform data or integrate different data sources as you move data from the EDW to the Hadoop infrastructure. IBM solutions provide a zero-coding environment. You can build a job once and run it virtually anywhere—in the EDW, the ETL grid or in the Hadoop cluster—without having to modify the job. This process gives you ten times the performance increase over hand-coding by using existing developer skills and ETL assets.

Transform and integrate data

EDW Extract Transform

Data repartitioning

Aggregate Load

Data repartitioning Data repartitioning

Page 6: The six new competencies Industrial companies need on their path ...

6

In many traditional data warehouse environments, organizations aren’t able to implement data quality processing. Many organizations use the EDW offloading process to eliminate garbage-in, garbage-out reporting and analytics by implementing comprehensive and scalable data quality processing. If you don’t put high-quality data into the Hadoop infrastructure, the resulting analytics are of limited value.

Improve data quality

EDW

Integrated data quality Single-user experience for data integration and designing and running data validation, standardization and matching rules

Understand the quality of your data sources.

Discover Discovery of business entities across heterogeneous sources

Cleanse Business-driven data standardization and matching

Create and maintain 360-degree views of customers, suppliers, products, locations and events.

Create user-defined data classification and validation.

Assess Data classification and validation rules linked to business rules for impact analysis

Monitor and remediate Enterprise-wide data quality exception monitoring and collaborative remediation

Monitor data quality and automatically send exceptions for collaborative remediation.

Apply data validation rules to ensure that only valid information is loaded into Hadoop.

Validate Data classification and validation rules linked to business rules for impact analysis

Life cycle governance Ownership and management of policies and rules

Specify consistent and reuseable data quality rules driven by business users and integrated with data governance.

Page 7: The six new competencies Industrial companies need on their path ...

7

Eliminate unmanageable data lakes and enhance business-user and IT collaboration with data governance. Data governance includes establishing clearly defined business terms across your organization for data used in reporting or business analytics. True data governance supports data lineage reporting, allowing users to see the history from source to report. You can learn the origins of underlying data elements, transformations performed against the data, and when the data was refreshed in the Hadoop infrastructure, and more. IBM governance solutions also help facilitate the role that people and process play in an effective governance program.

Govern data

EDW

Integrated data governance Single-user interface for governance activities, including common understanding for enterprise users, asset management and data lineage functionality

Discover Automate the discovery of metadata assets, including data and data integration processes.

Data glossary Establish a data glossary including business terms, and governance policies and rules.

Classify Automate the classification of data assets and assignment of business terms.

Data lineage Automatically generate full data lineage from EDW to Hadoop.

Shop for data Simplified exploration of data assets and understand key information and relationships about data assets.

Manage the metadata repository Manage metadata assets through a central repository supporting metadata analysis and reporting.

Page 8: The six new competencies Industrial companies need on their path ...

8

Replication is the ability to move changes in data from the source system to the target system, in as close to real-time as possible, without taxing source system processing. For example, changes in inventory in a data warehouse might be captured from the data warehouse, moved to Hadoop and be available for reporting and analytics in near real-time.

Replicate data

Sources– IBM DB2® (z/OS, LUW, iSeries)– IMS – VSAM– Oracle– SQL server

Capture

Log Push engine Apply engine

Targets

Push Apply

Page 9: The six new competencies Industrial companies need on their path ...

9

One of the benefits of Hadoop is that it’s a low-cost solution to store many new types of unstructured and semi-structured data. This data can be used to augment and enrich your analytics by combining them with traditional data from structured transaction processing. You need the same data integration, data cleansing and data governance capabilities to prepare these massive volumes of of structured, semi-structured and unstructured data for enriched analytics, machine learning and artificial intelligence.

Augment and enrich data

EDW

Unstructured data

Other sources

—Move data with InfoSphere DataStage and BigIntegrate—Transform and integrate with InfoSphere QualityStage and BigQuality—Improve data quality with InfoSphere Information Analyzer—Govern your data with Information Governance Catalog

Page 10: The six new competencies Industrial companies need on their path ...

10

Typically, EDW offloading represents a key step in a larger objective to modernize the enterprise analytics architecture. The value of modernization is different for every organization. Those priorities include the support of advanced, self-service analytics, the improvement of operational efficiency, and the introduction of artificial intelligence capabilities. But all require a similar foundation to carry out their mission. IBM has introduced a proven, flexible reference architecture that reduces the risks, costs and time required to modernize projects.

Enterprise analytics reference architecture

Analytics reference architecture: components and personas

Business analystData engineer

Chief data officerData scientist

App developer

Data sources Ingestion andintegration

Analytical data lake storage Data access Actionableinsight

Discovery and exploration

Enhanced applications

Analytics in motion

Analytics operating system

Information management and governance

Security

Platform

Data sources Ingestion Analytical data repositories Advanced analytics and self-service data

Page 11: The six new competencies Industrial companies need on their path ...

11

IBM has developed four analytics implementation patterns that augment the effectiveness of the reference architecture. Those patterns are self-service analytics, governed data lakes, private cloud-secured data lakes, and hybrid cloud or public applications.

These implementation patterns can be used to define key members in specific projects, set roles and responsibilities for users, and determine how to successfully integrate varied commercial and open-source software components.

Implementation patterns

IBM DataFirst private or hybrid cloud implementation patterns

Governeddata lake

Private cloudsecured data lake

Self-service analytics

Hybrid cloud or public applications

Page 12: The six new competencies Industrial companies need on their path ...

12

Why IBM

IBM can address your offloading and optimization projects by providing complete solutions at massive scale. IBM offers design, methodology, software and hardware components and uses tool-based automation to decrease time-to-value, provide repeatable processes and increase quality while lowering your risk and cost.

To learn more about the path to trusted, business-ready data, visit ibm.com/offloading-edw.

Page 13: The six new competencies Industrial companies need on their path ...

© Copyright IBM Corporation 2018

IBM Corporation New Orchard Road Armonk, NY 10504

Produced in the United States of America September 2018

IBM, the IBM logo, ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.

The content in this document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates.

THE INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided.

The client is responsible for ensuring compliance with laws and regulations applicable to it. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the client is in compliance with any law or regulation.

Statement of Good Security Practices: IT system security involves protecting systems and information through prevention, detection and response to improper access from within and outside your enterprise. Improper access can result in information being altered, destroyed, misappropriated or misused or can result in damage to or misuse of your systems, including for use in attacks on others. No IT system or product should be considered completely secure and no single product, service or security measure can be completely effective in preventing improper use or access. IBM systems, products and services are designed to be part of a lawful, comprehensive security approach, which will necessarily involve additional operational procedures, and may require other systems, products or services to be most effective. IBM DOES NOT WARRANT THAT ANY SYSTEMS, PRODUCTS OR SERVICES ARE IMMUNE FROM, OR WILL MAKE YOUR ENTERPRISE IMMUNE FROM, THE MALICIOUS OR ILLEGAL CONDUCT OF ANY PARTY.