Top Banner
Sponsored by: DATA WAREHOUSING IN THE CLOUD OPPORTUNITIES, BENEFITS, AND BEST PRACTICES Data Warehousing Strategy in the Public Cloud Paradigm: Opportunities for the Enterprise Why Data Warehouse Modernization Must Be Coordinated with Other Modernization Projects Migrating Your Data Warehouse to the Cloud: Changes Ahead About Our Sponsor 1 4 6 9 MARCH 2018
10

OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

May 23, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

Sponsored by:

DATA WAREHOUSING IN THE CLOUD

OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

Data Warehousing Strategy in the Public Cloud Paradigm: Opportunities for the Enterprise

Why Data Warehouse Modernization Must Be Coordinated with Other Modernization Projects

Migrating Your Data Warehouse to the Cloud: Changes Ahead

About Our Sponsor

1

4

6

9

MARCH 2018

Page 2: OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

1 TDWI E-BOOK DATA WAREHOUSING IN THE CLOUD: OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

If you’re considering moving your data warehouse to the cloud, there are two approaches you need to evaluate. To learn more about the differences between data warehouses in the cloud and cloud-native warehouses, we turned to Dan McClary, product manager at Google. Dan works on BigQuery and Google’s internal data warehouse, Dremel. Prior to joining Google, he served as senior principal product manager for big data at Oracle and director of business intelligence at Red Robot Labs in Palo Alto, CA. Dan has been on the faculty at UC Berkeley’s Masters in Data Science program and earned his Ph.D. in Computer Science from Arizona State University.

TDWI: What’s driving enterprises to consider the cloud for their data warehouse strategy?

Dan McClary: Although the financial parts of an organization may consider cloud migration as merely a means to optimize capital and operating expenditures, architects are wisely using the shift to the cloud to reimagine how the enterprise architecture serves as a manifestation of the business’s strategy. From a technology perspective, enterprises are looking at cloud data warehousing to handle larger data volumes, accommodate

DATA WAREHOUSING STRATEGY IN THE PUBLIC CLOUD PARADIGM: OPPORTUNITIES FOR THE ENTERPRISE

Moving to the cloud need not be complex if you carefully consider two possible architectures: data warehouses in the cloud and cloud-native data warehouses.

Data Warehouse Strategies Data Warehouse Modernization About GoogleMigrating Your Data

Page 3: OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

2 TDWI E-BOOK DATA WAREHOUSING IN THE CLOUD: OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

new sources of high-velocity data, and provide greater visibility into all of the organization’s data assets. This is one way IT can actively contribute to broader organizational strategies (for example, shifting to mobile technology or adopting digital marketing) and leverage the true competitive power of their data.

Indeed, cloud data warehouses are often appealing to teams that wish to meet these technology and business challenges while increasing their overall agility. In particular, cloud-native data warehouses often substantially reduce operational burdens. Within the data warehousing space, trends such as serverless computing are emerging that allow IT organizations to focus more on enabling the business to meet its challenges than on maintaining an estate of specifically provisioned servers.

What is the difference between a data warehouse in the cloud and a cloud-native data warehouse? Why is this distinction important?

Cloud-native data warehouses are designed with the public cloud architecture and constraints in mind. They often leverage serverless computing, in which you don’t need to manage nodes or cluster configurations. Cloud-native data warehouses may also separate storage and compute resources from the state of execution. This makes it easy for the users to just load data into the data warehouse and query the data immediately. The cloud-native data warehouse automatically manages concurrency, data growth, query performance, and core data operations such as disaster recovery, backups, and levels of availability. Because of the underlying architecture in a cloud-native data warehouse, you can scale the data warehouse to infinity without needing to worry about additional resource provisioning. It’s an “auto-everything” paradigm.

On the other hand, although many MPP data warehouses hosted in the cloud were designed to handle large data volumes, they often need careful administration and expert users to manage concurrency, real-time data streams, and query optimization for better performance. Database administrators also must carry out operational work such as capacity planning, disaster recovery strategies, and data backup.

Moving an MPP data warehouse architecture to the cloud still requires developers and DBAs to perform similar tasks in the cloud. A traditional cluster-and-node architecture in the cloud doesn’t give the horizontal scalability and responsiveness an organization may need to meet its business objectives. This can have unexpected, negative impacts on business agility and total cost of ownership.

What are the advantages of a cloud-native data warehouse? What are the drawbacks?

Cloud-native data warehouses have evolved within organizations that built their own businesses on exactly the technology and business trends enterprise architects are considering today: bigger, faster-moving data volumes and the shift to a more connected, more mobile world. Cloud-native data warehouses take advantage of emerging concepts such as serverless computing as well as separated storage and compute. These allow organizations of any size to realize reduced operational burdens as well as often-favorable changes in total cost of ownership. For example, Google’s BigQuery evolved from the company’s internal data warehouse (known as Dremel) and was designed specifically to simplify the analysis and management of the massive, fast-moving data that powers Google’s various businesses.

In many ways, enterprises that move to cloud-native data warehouses find they can better cope with the challenges that motivate a move to the cloud. However, there is the burden of the unfamiliar. Unlike simply deploying a cloud-hosted version of the business’s current data warehouse, an organization often must undergo some amount of retraining and retooling to adapt current processes to the best practices of a cloud-native system.

What factors should an enterprise evaluate when considering a move to the cloud? What are the factors that help you make the right choice?

You need to evaluate moving your data warehouse to the cloud from both technical and economic dimensions. You should compare the total cost of ownership of running your

Data Warehouse Strategies Data Warehouse Modernization About GoogleMigrating Your Data

Page 4: OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

3 TDWI E-BOOK DATA WAREHOUSING IN THE CLOUD: OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

data warehouse on premises to running it as a cloud-native data warehouse.

Data security and governance are two key areas to consider before you move your data warehouse to the cloud.

• Is your data always encrypted?

• Does the data warehouse support fine-grained, role-based access control?

• Does it support transparent audit logging for activity, data access, and billing?

If the system provides sufficient security and governance, architects should consider how the system increases IT’s ability to contribute to the business’s strategic initiatives. Specifically:

• Does the data warehouse scale seamlessly and simplify your operational burdens?

• Does the data warehouse allow teams to share and collaborate easily, across both data artifacts and analyses?

• Does the data warehouse automate data delivery of new sources demanded by the business?

• Does the data warehouse help your organization lay a strong foundation for predictive analytics and machine intelligence?

What changes do enterprises overlook when considering a move to the cloud?

Although most architects will ensure that a cloud data warehouse provides parity for the business’s most critical functions, they often overlook how the organization’s relationship with data will change in the coming decade. Increasingly, more people in more parts of the business want access to data and often that data is not being managed by the IT organization.

Enterprise architects would be wise to ask the following questions of the cloud data warehouses they evaluate:

• How does the data warehouse allow us to share and control data access within and beyond our organization?

• Does the warehouse provide interfaces and features ready to serve the growing number of employees who want access to data but are not database experts?

• Does the warehouse provide simple paths to capture the data users will come to demand: marketing sources, analytics from Web and mobile applications, and SaaS services?

• Does the warehouse reside in an ecosystem of tools that allow the business to better achieve its strategic goals? Are there accessible reporting tools and mobile analytics frameworks as well as stream processing and machine learning capabilities that can be integrated immediately?

Where do you think the future of cloud data warehousing is headed?

As organizational strategies take further advantage of digital advertising, mobile devices, and machine learning, cloud data warehousing forms an important bridge between the enterprise architectures that enabled the first generation of business intelligence and those that will serve AI-empowered businesses. Cloud data warehouses that truly accelerate organizations will display three important characteristics:

• Serverless computing to free IT to contribute more meaningfully to strategic business initiatives because time and expertise spent on data management can be redirected to leveraging data.

• Separation of storage and compute resources to allow organizations to achieve a full, actionable view of their data estates without sacrificing performance. This full view will reduce risk and increase the overall success rate of analytically driven business initiatives.

• Strong integration with emerging machine intelligence systems to allow organizations to quickly prove the value of predictive analytics.

Data Warehouse Strategies Data Warehouse Modernization About GoogleMigrating Your Data

Page 5: OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

4 TDWI E-BOOK DATA WAREHOUSING IN THE CLOUD: OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

One of the hottest trends in data warehousing (DW) is modernization—where DW professionals upgrade, redesign, and re-implement warehouses to give them future-facing capacity, speed, interoperability, and analytics.

We talk about (and even perform) data warehouse modernization as if it were an isolated project with isolated goals, but the reality is just the opposite. Data warehouse modernization is, in fact, usually one of many attempts at modernization that occur concurrently and have project dependencies. Here are examples of dependent modernizations you must coordinate with your data warehouse modernization.

Business Modernization

In an ideal world, upper management leads the way by deciding how to modernize the business to keep pace and stay relevant with evolving customers, partners, marketplaces, and economies. Business modernization and its goals are, in turn, articulated “down the org chart.”

At some point in that process, people in IT and similar groups (such as a data warehouse group) should collaborate with

WHY DATA WAREHOUSE MODERNIZATION MUST BE COORDINATED WITH OTHER MODERNIZATION PROJECTS

By Philip Russom

Data Warehouse Strategies Data Warehouse Modernization About GoogleMigrating Your Data

Page 6: OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

5 TDWI E-BOOK DATA WAREHOUSING IN THE CLOUD: OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

business managers to determine how data, applications, and technology can support the stated business modernization by thinking globally but acting locally. Even if you do not work in an ideal world, some semblance of that process should still be present to guide your alignment of warehouse modernization with business modernization.

Analytics Modernization

Online analytical processing (OLAP) continues to be the most common analytics method and it’s too valuable to replace or abandon. Instead, analytics modernization tends to introduce additional analytics methods that an organization has not deployed before, typically so-called advanced analytics. These are based on technologies for mining, clustering, graph, statistics, and natural language processing (NLP).

Often, new analytics are needed to support business modernization, such as when your organization wants to compete using analytics, improve operational excellence via analytics, and make decisions based on facts and analyses (whether the decisions are strategic, tactical, or operational).

Similarly, data warehouse modernization can be driven by analytics modernization because most warehouses were built for reporting and OLAP and therefore need to be extended or redesigned to accommodate the new data requirements of advanced analytics.

Data Platform Modernization

Technologies come, technologies go, but the data and the warehouse carry on. In TDWI’s definition, a data warehouse has three characteristics: it is (1) a data architecture with attendant data models, etc. that (2) are populated with data and (3) organized via metadata, indices, and other semantic mechanisms. By definition, the data warehouse and its underlying server platforms are separate and can be modernized separately.

Warehouse professionals have repeatedly migrated warehouse data and related pieces from SMP to MPP hardware, from 16-bit to 32-bit to 64-bit CPUs, from one vendor brand to another, and from server boxes to racks,

grids, and clusters. Whether you realize it or not, these are data platform modernizations, driven by new requirements for scale, speed, price, and future-proofing.

More often than not, modernizing warehouse data (to embrace dimensionality, real time and unstructured data, and detailed sources for analytics) may depend on data platform modernization for appropriate storage, capacity, interfaces, in-place processing, and multistructured data support. This is why modern data warehouses are still logical data architectures at heart, although the data is physically distributed across an increasing number of platform types, including new ones such as those based on columns, clouds, appliances, graph, complex event processing, and Hadoop.

Report Modernization

The style of reports has evolved dramatically since the early 1990s. Back then, reports were only on paper and consisted of one giant table of numbers after the next. Because a single report served dozens of user constituencies, the content of each report was mostly irrelevant to individual report consumers.

Luckily, waves of modernization have greatly improved reports, bringing them online (for greater distribution and ease of use, as well as drill-down), giving them a visual presentation (for interpretation at a glance), organizing them around metrics and KPIs (in support of performance management methods), and personalizing them so users go straight to what they need (for productivity and relevance).

The majority of data warehouses continue to be designed by users and deployed mostly in support of reporting and OLAP. As the style of reporting has evolved, warehouse data structures have had no trouble modernizing to keep pace with changes in reporting. More dramatic change is seen in users’ portfolios of tools for reporting, which still include older enterprise reporting platforms but are now augmented with newer tools for dashboarding, data visualization, and data exploration.

Data Warehouse Strategies Data Warehouse Modernization About GoogleMigrating Your Data

Page 7: OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

6 TDWI E-BOOK DATA WAREHOUSING IN THE CLOUD: OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

Organizations continue to modernize their data warehouses to keep pace with new technology, business, and end-user requirements. Technology requirements include the need to effectively handle big data, multistructured data, and streaming data, as well as external data. Meanwhile, the business increasingly requires advanced analytics, cost control, agility, and self-service data access for a wider range of end users.

To satisfy these diverse requirements, many users are migrating their data warehouse and related systems to cloud-based data warehouse platforms. To understand the drivers and success factors for migrating a data warehouse to the cloud, TDWI recently spoke with Lak Lakshmanan, a machine learning and analytics practice lead at Google Cloud.

TDWI: Once an enterprise has decided to move to a cloud-based data warehouse, what are the first steps it needs to take?

Lak Lakshmanan: Instead of a risky “big bang project”—where you try to do too much, too early in your relationship with a new platform—a data warehouse migration should be organized as an easily managed multiphase project.

MIGRATING YOUR DATA WAREHOUSE TO THE CLOUD: CHANGES AHEAD

Greater speed, scale, flexibility, modern technology, and business innovation await enterprises that move their data warehouse to the cloud. However, the transition may bring additional, possibly unexpected, changes to your environment.

Data Warehouse Strategies Data Warehouse Modernization About GoogleMigrating Your Data

Page 8: OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

7 TDWI E-BOOK DATA WAREHOUSING IN THE CLOUD: OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

Because a modern data warehouse is a conglomerate of several components, technical users and their business colleagues can prioritize high-value components that should be migrated during the early phases.

For example, business analytics is a high priority for many organizations. An enterprise might remove from its existing data warehouse environment its large analytics datasets as well as related components for analytics sandboxes and data labs. This well-bounded phase can demonstrate a solid technical success, coupled with immediate business value, which in turn reinvigorates everyone’s excitement and resolve for driving the project forward.

Lessons learned in analytics—or whatever first phase target you select—can then be applied to migrating other components of the data warehouse environment. Common early-phase foci include data landing and staging (which are notoriously in need of modern ingestion methods, especially for external and real-time data), high-value data domains (e.g., customer data for multichannel marketing or partner data for supply chain optimization), and data consolidations (which take ownership of data marts and other rogue departmental datasets so they have higher quality, tighter governance, and safer security). Over time, users can migrate the rest of the data warehouse to the cloud or they can choose to leave some components on premises.

Once the data warehouse is in the cloud, how does an enterprise change?

If a cloud data warehouse drives innovation in an enterprise, then we consider the new platform a success. For example, advanced analytics and even machine learning are typically an early priority for data warehouse migrations and cloud adoption because it enables a business to operate more innovatively, to compete, grow, adapt to change, increase profitability, and achieve greater efficiencies.

Other innovative business practices demand that data be shared with little friction for collaboration across global business units; a consolidated cloud data warehouse can help break down silos and speed up cross-functional activities. Innovation assumes large numbers of end users working concurrently and autonomously; a cloud data warehouse with

self-service tools can make this happen with good performance for ad hoc queries, data exploration, and data prep.

Turning batch processing analytics into more timely analytics is another desirable business innovation, and a cloud data warehouse with powerful ingestion methods can collect data frequently—even streaming external data—so end users can monitor business performance throughout the business day via modern data pipelines. On the cutting edge, innovative businesses want predictive machine learning so algorithms can make decisions at run time based on incoming data, to speed up business processes and personalize offerings. This requires modern tools and platforms, as seen in cloud data warehouses.

Some cloud-native data warehouses are serverless and fully managed. They significantly reduce the need for management and operation of the data warehouse so the data analysts can focus completely on analyzing data. In this scenario, IT teams need not spend time in capacity planning, load balancing, backup, disaster recovery, and so on. Instead, they can focus on end-user productivity and driving the adoption of data-driven decision making within the organization.

Let’s talk about some of the changes to specific roles within an organization, especially the database administrator or DBA.

One of the greatest benefits of serverless cloud architecture is that it automatically manages the systems layer of the data warehouse architecture with little or no human intervention. This amounts to far less administrative work than in traditional on-premises systems. On clouds optimized for data management, even indexing and backups are handled by the cloud. No one’s losing a job due to cloud automation. Instead, the cloud’s automation frees up DBAs, systems analysts, and other IT personnel to do higher-value work, such as data modeling, building data pipelines, orchestrating loads and flows, assisting with data quality and integration, and analyzing data.

Data Warehouse Strategies Data Warehouse Modernization About GoogleMigrating Your Data

Page 9: OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

8 TDWI E-BOOK DATA WAREHOUSING IN THE CLOUD: OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

Are there other impacts to an enterprise, such as security or governance?

In today’s world, regulatory compliance, security, and governance of data are more important than ever, no matter where data physically resides. Cloud data is no different—you still need to have processes in place for security and compliance. However, more data is now off premises and data travels among systems more, whether on premises, in the cloud, or some hybrid combination. Some users still feel uneasy with this reality—security by obscurity was never a great option, but now it is no longer an option.

However, the good news is that, for the most part, your existing best practices and tools for security and governance also work with cloud and hybrid environments. For example, when an organization has existing data governance or stewardship programs, those policies can be expanded to govern cloud platforms and their interfaces.

Furthermore, the leading cloud providers have taken security seriously; they now support data-specific forms of security such as data encryption, masking, and tokenization. Frankly, many users still haven’t implemented encryption and masking for their data warehouses, so migrating the warehouse to a cloud can actually modernize their security significantly.

The quality of the security offered by a modern public cloud—Google, for example—employs hundreds of computer security researchers and practitioners who deal with security threats far more often and have processes in place to do so quickly and repeatably. Thus, the security on a public cloud is typically better than what many users can manage by themselves.

Tell us about Google’s BigQuery, your data warehouse platform that’s optimized for advanced analytics and designed from the bottom up for the cloud.

BigQuery is Google’s serverless, highly scalable, low-cost enterprise data warehouse designed to make all data analysts productive. Because there is no infrastructure to manage, enterprises can focus on analyzing data to find meaningful insights using familiar SQL.

Our goal with BigQuery has been to build a data platform that leverages to the hilt the many great technologies now available in cloud environments, while also supporting—in the most modern way possible—older data technologies still relevant today and therefore required by many data warehouse professionals.

For example, on the leading edge, Google’s BigQuery is a serverless compute architecture that decouples compute and storage. This enables diverse layers of the architecture to perform and scale independently and it gives data developers flexibility in design and deployment. We’ve included deep support for old-school ANSI-standard SQL, columnar optimization, and federated queries, which are key to the self-service ad hoc data exploration that many users demand.

Other BigQuery functions mix old and new, as in our centralized metadata management, which is augmented by modern data cataloging. Likewise, we have ingestion methods for over a thousand tools, applications, and interfaces to accommodate both traditional internal and modern external sources and targets, at any latency imaginable.

Data Warehouse Strategies Data Warehouse Modernization About GoogleMigrating Your Data

Page 10: OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

9 TDWI E-BOOK DATA WAREHOUSING IN THE CLOUD: OPPORTUNITIES, BENEFITS, AND BEST PRACTICES

cloud.google.com

Google Cloud Platform (GCP) makes business insights available on demand via a set of serverless data analytics services that surpass conventional limitations on scale, performance, and cost-efficiency.

You can leave the complexities of data analytics behind and

• Use Google BigQuery, a cloud-native serverless data warehouse that executes queries in seconds instead of minutes, at any scale, for accelerated time to insight

• Ingest and analyze up to millions of events per second in real time with Cloud Pub/Sub and Cloud Dataflow

• Get value faster from data processing on Apache Spark and Apache Hadoop with Cloud Dataproc

• Visualize and explore data, publish dashboard and reports to share insights using Google Data Studio and existing third-party BI tools

• Bring predictive analytics into your applications by adopting machine learning at your own pace using Cloud Machine Learning Engine or pre-trained machine learning APIs

Please visit https://cloud.google.com/solutions/big-data/ for more information.

tdwi.org

TDWI is your source for in-depth education and research on all things data. For 20 years, TDWI has been helping data professionals get smarter so the companies they work for can innovate and grow faster. TDWI provides individuals and teams with comprehensive business and technical education and research that allow them to acquire the knowledge and skills they need, when and where they need them.

TDWI advances the art and science of realizing business value from data by providing an objective forum where industry experts, solution providers, and practitioners can explore and enhance data competencies, practices, and technologies.

TDWI offers four major conferences, topical seminars, onsite education, a worldwide membership program, business intelligence certification, live webinars, resource-filled publications, industry news, an in-depth research program, and a comprehensive website at tdwi.org.

© 2018 by TDWI, a division of 1105 Media, Inc. All rights reserved. Reproductions in whole or in part are prohibited except by written permission. Email requests or feedback to [email protected].

Product and company names mentioned herein may be trademarks and/or registered trademarks of their respective companies.

Data Warehouse Strategies Data Warehouse Modernization About GoogleMigrating Your Data