White Paper Improving Analytics Economics with Cray...Virtualization and containerization of workloads (with Mesos, OpenStack, and Docker). Servers, storage, and networking infrastructure.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Comparisons to the Cray Urika-GX Agile Analytics Platform By Nik Rouda, ESG Senior Analyst and Mike Leone, ESG Senior Lab Analyst August 2016 This ESG White Paper was commissioned by Cray Inc. and is distributed under license from ESG.
Enterprise Strategy Group | Getting to the bigger truth.™
Improving Analytics Economics with Cray
White Paper
White Paper: Improving Analytics Economics with Cray 2
A New Focus on the Power of Analytics ............................................................................................................................... 3
Analytics Initiatives Are Necessarily Interdisciplinary ........................................................................................................... 3
Finding the Best Fit Model for Analytics in the Data Center ................................................................................................. 4
Modeling Costs of a Big Data Infrastructure with Hadoop ................................................................................................... 5
The Cost of a Big Data Infrastructure Using a Do-it-yourself, Build-your-own Approach ..................................................... 6
Lowering Capital and Operational Expenses with the Cray Urika-GX Platform ..................................................................... 8
Comparing the Cray Urika-GX Platform with a DIY, Build-your-own Approach .................................................................. 10
The Bigger Truth ................................................................................................................................................................. 10
White Paper: Improving Analytics Economics with Cray 3
A New Focus on the Power of Analytics Anyone paying attention to recent IT trends is well aware that analytics have become a top-tier priority. Indeed, there has
been much written in the technology industry and even popular media about the power of big data and analytics, with
truly amazing applications and outcomes spanning all industries and lines of business. It is possible to understand the world
at the macro and micro levels, in real time and over decades, in ways that simply weren’t feasible or affordable before
now. New technologies are changing the rules, with data environments like Hadoop, Spark, Mesos, and graph analytics
rapidly growing in popularity.
Sadly, many also now realize that achieving their own ambitions for analytics is often harder (and costlier) than expected.
While there is massive potential to glean more actionable insights from more data than ever before, the sheer scope of the
effort can be daunting. ESG research has shown that 77% of those responsible for their organizations’ big data and
analytics strategies and new projects believe it will typically take more than six months before meaningful business value is
seen.1 Finding ways to shorten this delay and have an impact sooner is imperative to satisfy the needs of the business.
Otherwise, there is the likelihood that big data analytics will be seen as over-hyped and under-productive, and investment
will be withdrawn from worthy initiatives.
When leveraging a pre-built, -integrated, and -tested big data analytics platform, organizations can significantly reduce the
time to gain eventual insight by reducing the time and cost of researching, testing, procuring, deploying, and managing a
complete analytics solution, but that usually come at a cost. With that in mind, ESG quantified the economic advantages of
the Cray Urika-GX analytics platform when compared to a do-it-yourself solution over a three-year period.
Analytics Initiatives Are Necessarily Interdisciplinary
A significant challenge in this space is that the analytics is only one piece of a much broader environment. The applications
that a business analyst or data scientist will directly employ in their work are intricately dependent on the underlying
technology stack. The dynamic and expanding nature of the number of tools emerging means that all the below is going to
continuously change and grow. There will be no static state, and that points to the need for openness and versatility in a
solution. This includes a range of technologies often including, but not limited to:
Big data platforms (i.e. Hadoop and Spark), analytics engines, and programming languages (like Python and R).
Applications, business intelligence (BI), visualization, and reporting applications.
Data warehouses and databases (for example, Cassandra).
Data ingestion, pipeline, integration, ETL, and governance software (including Kafka).
Security frameworks.
Virtualization and containerization of workloads (with Mesos, OpenStack, and Docker).
Servers, storage, and networking infrastructure.
The problem is that this list covers a surprisingly wide span of IT disciplines, domains, and skills, meaning that you will need
a very diverse team to collaborate in order for analytics initiatives to be successful. This is reflected in ESG survey results
showing how many different teams must be involved to build a whole solution; as shown in Figure 1, seven different areas
of competence were seen as crucial or important to have engaged. Just getting this many people into a conference room
or on a call isn’t easy, and it’s even harder to work through each group’s unique concerns and interests to arrive at a
consensus. Making matters worse, every area covered will likely have to go through its own process of defining specific
requirements, identifying possible vendors, evaluating products, negotiating price, and receiving, deploying, and
integrating the entire system. This leads to an increased likelihood of component mismatches, if not outright functional
gaps or compatibility conflicts, unless the entire process is managed very deliberately with careful attention to detail. No
wonder most new analytics initiatives get bogged down and take longer than six months to show results, as previously
noted.
1 Source: ESG Research Report, Enterprise Big Data, Business Intelligence, and Analytics Trends: Redux, July 2016. All ESG research references and charts in this white paper have been taken from this research report.
Figure 1. Many IT Disciplines Required for a Successful Analytics Initiative
Source: Enterprise Strategy Group, 2016
One potential shortcut is the utilization of public cloud infrastructure-as-service (IaaS) offerings, which make sense, for
example, when applications and data are already hosted in the cloud, or when resource elasticity is critical. Yet clouds
don’t always meet enterprise requirements for predictability and performance. Many select cloud-based analytics hoping
to reduce the effort of provisioning a hardware environment, yet there are still demands on the wide-area network,
security requirements, and the complete software stack to be managed. While the elasticity sounds promising, the full
costs of cloud services are often less predictable, particularly for large-scale big data and analytics environments. In many
regulated businesses, using cloud may also make it harder to meet externally defined requirements for security, privacy,
and governance. Accordingly, current ESG research shows that less than 20% of organizations expect cloud to be their
primary deployment model for analytics. 2 For a large majority, the question will be about how to efficiently design and
deploy a capable and economical on-premises solution.
Finding the Best Fit Model for Analytics in the Data Center
Many interested in analytics might not see why the infrastructure matters all that much. Surely, tuning of the database and
the analytics models will be sufficient to improve performance, right? The reality is that a number of factors are relevant
here. While poorly written analytics will definitely slow response, increase resource demands, and reduce concurrency, so
will poorly matched hardware limitations around system processors, memory, and storage. And that’s just around
performance—there are additional requirements for enterprise operational quality like scalability, availability, reliability,
recoverability, and supportability. Most organizations will define “success” as meeting needs in all of these areas. Cost
itself has as many considerations: capital cost of acquisition, opportunity cost of delays, manpower costs of effort, and
ongoing costs of operation. The complete technology stack—hardware and software alike—is going to define the overall
outcomes.
2 ibid.
29%
29%
32%
33%
35%
43%
45%
48%
53%
47%
46%
48%
42%
40%
18%
12%
13%
15%
13%
11%
10%
3%
4%
6%
5%
2%
3%
3%
2%
1%
1%
1%
2%
1%
1%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Storage team
Applications team
Networking team
Infrastructure/cloud architects
Server/virtualization team
Database/BI/analytics team
Security/risk/governance team
How important is the involvement of the following IT disciplines for new initiatives and projects in the area of big data and analytics to be successful? (Percent of respondents,
N=475)Crucial Important
Nice-to-have, but not required Completely unnecessary
Don’t know / no opinion
White Paper: Improving Analytics Economics with Cray 5
While utilization of commodity hardware and open source software may sound like an effective strategy to having choice
and controlling costs of the combined infrastructure, they do nothing to reduce the inherent complexity. In fact, the total
cost of ownership may be higher than using vendor “proprietary” alternatives that offer simplified deployment,
management, and support. This is reflected in big data buying preferences; for example, only 24% expect to use purely
open source Apache Hadoop, with a majority using at least some vendor-backed distributions for the additional
advantages they bring.
Given that 1) analytics is a top priority, 2) time-to-value is generally too long, 3) quality will depend on having a well-defined and tightly integrated stack, and 4) significant effort and expenses may be incurred, how should enterprises proceed? One popular answer is to explore the possibilities of a pre-integrated analytics platform (or “engineered system,” if you prefer the term). Nearly a quarter of enterprises (23%) are indeed planning to use purpose-built, pre-integrated systems as their primary deployment. 3 There are a number of motivations for this practice, but one negative assumption about this approach is worth deeper exploration here, and that is a widespread impression that appliances are too expensive and therefore only intended for the most intensive analytics at the biggest companies and government labs. There is another common perception that appliances are “locked down” to the vendor defined set of software and configurations, and therefore not adaptable for anything but a specific need. That belief in itself may drive people to DIY - so they can customize and keep customizing as the needs change.
To explore this belief, ESG will now examine two approaches: developing your own environment versus selecting a ready-
made system. For the purpose of this comparison, we’ll look at the Cray Urika-GX platform versus an equivalent
commodity kit.
Modeling Costs of a Big Data Infrastructure with Hadoop
When modeling the cost of a big data infrastructure over a three-year period, both capital expenses (CapEx) and
operational expenses (OpEx) should be examined. A large portion of CapEx comes in the first year, due to the initial cost of
acquisition, which includes paying for all of the hardware and software required to make up a big data infrastructure. This
not only includes costs for compute blades (CPU, memory), storage, and networking, but also software licensing and
infrastructure support. Within the licensing and support category, sub categories must be included to factor in costs for
hardware support, core software licensing and support (OS and management software), and big data analytics software
licensing and support. For years two and three of the modeling exercise, additional capital expenses must be accounted for
to address continued licensing and support requirements across the whole infrastructure, including hardware, core
software, and analytics software.
For OpEx, there are two modeling phases. The first focuses on preparation, which includes technology research, shopping,
evaluating, procuring, and testing. This phase is difficult to model as the amount of quantitative variables related to time
are vast and fall in a wide range. Available technology, personnel expertise/competency, and budget are just a few of the
factors that impact this phase. The second phase focuses on deployment and management of the system. Hard costs can
be assigned to infrastructure deployment times, based on full-time employee salaries and expected deployment and
integration times of hardware, core software, and big data software. The full-time employee salaries can then be applied
to management and maintenance costs of the big data infrastructure based on the overall size of the infrastructure.
General Configuration Details and Assumptions
ESG completed a three-year, total cost of ownership (TCO) analysis of a Cray Urika-GX platform and compared it to a
similarly configured infrastructure using a do-it-yourself (DIY), build-your-own approach. Both CapEx and OpEx costs were
factored into the model, excluding the preparation phase (research, shop, evaluate, procure, test). The model of the Cray
Urika-GX platform was completed using internal pricing provided by Cray, while DIY pricing was determined by averaging
the cost of industry-leading vendor offerings at a component level configured to match the Cray Urika-GX offering. This not
only included the cost of core components, but also accounted for common discount pricing and associated support and
licensing costs over three years from leading vendors.
3 Ibid.
White Paper: Improving Analytics Economics with Cray 6
All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources. The
Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are subject
to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this
publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express
consent of The Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and, if applicable,
criminal prosecution. Should you have any questions, please contact ESG Client Relations at 508.482.0188.
Enterprise Strategy Group is an IT analyst, research, validation, and strategy firm that provides actionable insight and intelligence to the global IT community.