Top Banner
Rich Dill Solutions Engineer, SnapLogic [email protected] Top 10 challenges of making big data real – and tips to overcome them
19

Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

Jan 15, 2015

Download

Technology

SnapLogic, Inc.

This workshop presentation was given by Rich Dill, Solutions Engineer at SnapLogic at the GigaOm Structure Data Conference, March 20-21, 2013 in New York City, NY.

What are the Top Ten Challenges?

1. A miracle occurs here - Of course we can connect to it…
2. There is always more data than you expected - Unless there is not enough data to be meaningful
3. Never mistake a memo for reality - Did you hear what I said or what I meant?
4. It is logically impossible to schedule for the unknown
5. There is life beyond American English - Eventually you will have to deal with other languages
6. Of course the data is accurate, clean and ready - Data quality issues can kill project schedules
7. Dealing with unstructured data is fun - Somewhere buried inside is your delimiter where you least expect it
8. The data and process is subject to… Pick your acronym PCI, FIX, HIPAA, SOX
9. The requirements once defined are set in stone - Requirements almost always evolve
10. The most critical data will be on the most difficult platform to access - “a good deal of our case data is on Notes running on AS400”
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

Rich DillSolutions Engineer, [email protected]

Top 10 challenges of making big data real – and tips to overcome them

Page 2: Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

A play on Dave Letterman’s top 10

• 1. A miracle occurs here- Of course we can connect to it…

• 2. There is always more data than you expected- Unless there is not enough data to be meaningful

• 3. Never mistake a memo for reality- Did you hear what I said or what I meant?

• 4. It is logically impossible to schedule for the unknown- Or the relationship between developers and weathermen

• 5. There is life beyond American English- Eventually you will have to deal with other languages

2

Page 3: Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

A play on Dave Letterman’s top 10

• 6. Of course the data is accurate, clean and ready- Data quality issues can kill project schedules

• 7. Dealing with unstructured data is fun- Somewhere buried inside is your delimiter where you

least expect it

• 8. The data and process is subject to… - Pick your acronym PCI, FIX, HIPAA, SOX

• 9. The requirements once defined are set in stone- Requirements almost always evolve

• 10. The most critical data will be on the most difficult platform to access- “a good deal of our case data is on Notes running on

AS400”3

Page 4: Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

A miracle occurs here

• Of course we can connect to it…

4

Page 5: Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

And we know the image resonates, v2…

5

Page 6: Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

SnapLogic Solution

Users

Mobile

Enterprise

Cloud Big Data

Data Center

ESB RDBMS

Amazon Redshift

Page 7: Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

There is always more data than you expected

• Unless there is not enough data to be meaningful- It’s feast or famine- Distributed systems replicate data• At the site level and at the network level

- 3x at the data center in Houston and 3x in Chicago

- Replicated data can increase the cost of hardware, network and software

- We are far from normal• Data is organized for performance and

reliability not space efficiency7

Page 8: Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

It is logically impossible to schedule for the unknown

• Or my theory of the relationship between developers and weathermen

• The accuracy of an estimate is a function of the number of variables and the length of the project

8

Page 9: Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

Never mistake a memo for reality

• Did you hear what I said or what I meant?• Are you a literal listener?

- Psycholinguistics should be required reading for project managers

• Waterfall process - Allows you to build something the user wants today that you

deliver in 9 months or two years

• Iterative process- We’ll figure it out as we go along- Not really suited for deep architectural designs

• Process- Listen- Process- Repeat back “this is what I heard you say”

• Nothing beats showing a functioning prototype, demo or wireframe

9

Page 10: Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

There is life beyond American English• Eventually you will have to deal with other

languages- German will test your user interface spacing- Cyrillic will add to the character set

• Middle eastern languages- Read right to left- Some languages don’t have consistent spelling

• Far eastern languages- There is no such thing as Chinese

• Mandarin is the “Speech of Officials”• Cantonese is used in Hong Kong• Hangul is used in Korea• Japanese

- Kanji is adopted Chinese characters- Kana is a combination of Hiragana & Katakana

10

Page 11: Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

Of course the data is accurate, clean and ready

• How good is the data?- Profiling the data is key to accurate project estimates- What percentage of the data is null, blank, invalid?

• Data lifecycle includes- Acquisition or creation- Validation

• Business rules• Which may result in…

• Data cleansing- Zip code tables, barcodes, D & B credit ratings- Public data resources: www.data.gov

• Storage in an accessible format/location• Archiving

- Industry or legal rules for archiving

11

Page 12: Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

Dealing with unstructured data is fun• Somewhere buried inside is your delimiter where

you least expect it• Email is one of the most complex to handle• Hierarchal data structures must be mapped or

navigated• XML is not the end all, be all of structure data

formatting- JSON- BSON- SomethingImissedSON

12

Page 13: Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

Big Data Reference Architecture

Structured Data

UnstructuredData

DB

Collect Translate & Enrich Distribute1 32

DataView

DB

Page 14: Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

The data and process is subject to… • Pick your acronym: PCI, FIX, HIPAA, SOX• Almost every industry has some form or another

of data handling protocols that must be addressed

• These protocols are a combination of- Data creation- Data access- Technology and workflow- It is not just encryption and access

• Know your customers requirements!

14

Page 15: Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

The requirements once defined are set in stone

• What your users know today is not what they will know tomorrow…

• Requirements evolve• Why do you think they call them users?

- If you are successful they will want more

• Things change- Economy- Budgets- Timeframe- Management

• Feature creep is not a bad thing if budgets and timelines also creep

15

Page 16: Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

The most critical data will be on the most difficult platform to access

• “A good deal of our case data is on Notes running on AS400”

• Discover where the data is first• When can you access it?

- 24x7, after hours, on demand

• Throughput is key- Either during business hours of afterwards

• What conditions?- One time download- Scheduled - Event based- Stream

• What about security requirements?- There is a performance impact of encryption during

transmission16

Page 17: Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

Containerization with Snaps

BUY• SnapStore• Certified and

supported by SnapLogic

BUILD• SDK + API• Java, Python• Customer, Partner or

SnapLogic

Page 18: Top 10 Challenges of Making Big Data Real and Tips to Overcome Them

The eleventh rule

• Free software sometimes is worth the cost- Or the money you save on licenses is

multiplied by the cost of training and consultants

- In most cases labor is the one of the biggest costs of most software projects

• Open source is NOT the same as free!- Subscription vs. perpetual licenses- Does the customer need to

• Expense or capitalize software licenses

18