Top Banner
Data Science Innovation: Systems of insight & Machine Engineering @Soody linkedin.com/in/sureshsood http://www.slideshare.net/ssood/systemof-insight
28

Systemof insight

Apr 14, 2017

Download

Data & Analytics

Suresh Sood
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Systemof insight

Data Science Innovation: Systems of insight & Machine Engineering

@Soody

linkedin.com/in/sureshsood

http://www.slideshare.net/ssood/systemof-insight

Page 2: Systemof insight
Page 3: Systemof insight

The Future of the Professions (Susskind & Susskind 2015)

• Tax and audit work replaced by computer assisted techniques

• Technology automating and innovating

• Accounting work reconfiguring

• New business models

• Move from bespoke to “off the peg”

• Mastery of data with new tools and techniques - Big Data

• Diversification

• Shift to proactivity from reactivity

• Professionals replaced by less expert people and high performing systems

• Post-professional society expertise available online

Page 4: Systemof insight

The Future of the Professions How Technology Will Transform the Work of Human Experts, Richard Susskind and Daniel Susskind (2015)

Page 5: Systemof insight

'The Predictive Accountant’ Persona

1. CA SMP Practice and Member

2. Data savvy

3. Focus shifts from being reactive to proactive and predictive

4. Leverages accounting data and predictive analytics software to find patterns in data and insights

5. Uses the tools and dashboards to predict client scenarios before time: maximising opportunity, limiting risks and proactively advising.

6. CA ANZ SMP’s benefit from analytics by adding value when connecting SME client challenges and opportunities to identified customer patterns. Sharing these insights delivers more value in the accounting conversations and helps tackle the real business problems facing clients.

9

Page 6: Systemof insight

Key Drivers Informing Our Thinking1. New ways of looking at traditional accounting & client data

2. Innovation from new data sources built on democratisation of data

3. Democratisation of data science - Predictive capability of big data (correlations & data science)

4. Systems of Insight achieve machine engineering (insight to process or application)

5. Embedded analytics, messaging and mobile impacts client experience

Page 7: Systemof insight

• A great NZ invention ! • Powerful statistical programming language• Most widely used data analysis software

• 2M+ data scientists, statisticians and analysts

• Creates unique data visualizations• New York Times, Twitter and Flowing Data

• Thriving open-source community• Leading edge of analytics research

• Fill talent gap with new grads• Highest paid IT skill (Dice.com, Jan 2014)• Most-used data science language after SQL (O’Reilly, Jan 2014)• Used by 70% of data miners (Rexer, Sep 2013)• #15 of all programming languages (RedMonk, Jan 14)• Growing faster than any other language (KDnuggets, Aug 13)

Open Source R

Page 8: Systemof insight

‘The Predictive Accountant Portal

The Predictive Accountant Data SourcesPredictiveAnalyticsExcel style dashboard

Connected PracticeDigital Marketing / eNewsletters/ Integrated business

tools softwareApps MarketplaceAccounting Analytic Apps

Education Analytic Training

Page 9: Systemof insight

Areas for Discussion

1.) Data Science Innovation

2.) Systems of Insight

3.) Machine Engineering

Page 10: Systemof insight

2020 Global Data Forecast (Bytes)

2020 estimates suggest four times more digital data than all the grains of sand on Earth

Source: Pg. 4, Building a Digital Analytics Organization: Create Value by Integrating Analytical Processes, Technology, and People into Business Operations by Judah Phillips, FT Press, 30 Jul 2013

Page 11: Systemof insight

Data Science Innovation

Data science innovation is something an organization or individual has not done before using data. The innovation focuses on discovery using new or nontraditional data sources solving new problems.

Adapted from:Franks, B. (2012) Taming the Big Data Tidal Wave, p. 255, John Wiley & Son

Page 12: Systemof insight

Variety of Data Types & Big Data Challenge

1. Astronomical 2. Documents 3. Earthquake4. Email5. Environmental sensors 6. Fingerprints7. Health (personal) Images8. Graph data (social network)9. Location10.Marine11.Particle accelerator 12.Satellite13.Scanned survey data 14.Sound15.Text16.Transactions17.Video

Big Data consists of extensive datasets primarily in the characteristics of volume, variety, velocity, and/or variability that require a scalable architecture for efficient storage, manipulation, and analysis.

. Computational portability is the movement of the computation to the location of the data.

Page 13: Systemof insight

Hado

op C

onfig

urat

ions

(Si

ngle

and

Mul

ti-Ra

ck)

Adapted from: http://stackiq.com/

Cluster manager e.g. Apache Ambari, Apache Mesos, or Rocks

3 TB drives ,18 data nodes configuration represents 648 TB of raw storage HDFS standard replication factor of 3216 TB of usable storage

Name/secondary/data nodes – 6 core 96 GBManagement node – 4 core 16 GB

Page 14: Systemof insight

Data Science Workflows & Business Data Discovery

a

Page 15: Systemof insight
Page 16: Systemof insight

http://tacocopter.com/

New Sources of Information (Big data) : Social Media + Internet of Things Innovations

7,919 40,204

2,003,254,102 51 Gridded Data Sources

Page 17: Systemof insight

8. Oil reserves shipment monitoring

Ras Tanura Najmah compound, Saudi Arabia

Source: http://www.skyboximaging.com/blog/monitoring-oil-reserves-from-space

Page 18: Systemof insight

The following BigQuery query (note that the wildcard on "TAX_WEAPONS_SUICIDE_" catches suicide vests, suicide bombers, suicide bombings, suicide jackets, and so on):

SELECT DATE, DocumentIdentifier, SourceCommonName, V2Themes, V2Locations, V2Tone, SharingImage, TranslationInfo FROM [gdeltv2.gkg] where (V2Themes like '%TAX_TERROR_GROUP_ISLAMIC_STATE%' or V2Themes like '%TAX_TERROR_GROUP_ISIL%' or V2Themes like '%TAX_TERROR_GROUP_ISIS%' or V2Themes like '%TAX_TERROR_GROUP_DAASH%') and (V2Themes like '%TERROR%TERROR%' or V2Themes like '%SUICIDE_ATTACK%' or V2Themes like '%TAX_WEAPONS_SUICIDE_%')

The GDELT Project pushes the boundaries of “big data,” weighing in at over a quarter-billion rows with 59 fields for each record, spanning the geography of the entire planet, and covering a time horizon of more than 35 years. The GDELT Project is the largest open-access database on human society in existence. Its archives contain nearly 400M latitude/longitude geographic coordinates spanning over 12,900 days, making it one of the largest open-access spatio-temporal datasets as well.

GDELT + BigQuery = Query The Planet

Page 19: Systemof insight

Internet of Things “trillion sensors”

Source: www.tsensorssummit.org

Page 20: Systemof insight

Black Box Insurance • Big data transforms actuarial insurance from using probability methods to estimate premiums into dynamic risk management using real data generating

individually tailored premiums

• Estimate 20 km work or home journey, data point acquired every min and journey captures 12 points per km. Assume 1000 km per month driving or generating 12,000 points per month resulting in 144,000 points per car/annum. Hence, 1,000 cars leads to 144 million points per annum.

• Telematics technology (black box) monitor helps assess the driving behavior and prices policy based on true driver centric premiums by capturing: – Number of journeys – Distances travelled– Types of roads – Speed– Time of travel – Acceleration and braking– Any accidents – Location ?

• Benefits low mileage, smooth and safe drivers

• Privacy vs. Saving monies on insurance (Canada ; http://bit.ly/Black_box)

Page 21: Systemof insight

The ANZ Heavy Traffic Index comprises flows of vehicles weighing more than 3.5 tonnes (primarily trucks) on 11 selected roads around NZ. It is contemporaneous with GDP growth.

The ANZ Light Traffic Index is made up of light or total traffic flows (primarily cars and vans) on 10 selected roads around the country. It gives a six month lead on GDP growth in normal circumstances (but cannot predict sudden adverse events such as the Global Financial Crisis).

http://www.anz.co.nz/about-us/economic-markets-research/truckometer/

ANZ TRUCKOMETER

Page 22: Systemof insight

What is Machine Learning?

Machine learning is a scientific discipline that deals with the construction and study of algorithms that can learn from data. Such algorithms operate by building a model based on inputs and using that to make predictions or decisions, rather than following only explicitly programmed instructions.

http://en.wikipedia.org/wiki/Machine_learning

Page 23: Systemof insight

Computer Data

Program Output

Computer

Data

Output Program

Traditional Computing Paradigm, Machine Learning

Page 24: Systemof insight

Netflix – A Picture of A Data Driven Company • ~75 million users

• 8.5 million events per second

• Zero loss?

• 550 billion events per day

• Hundreds of event types

• 1.3 PB/day

• 21GB /sec (peak)

• 37% of peak US internet bandwidth

• Operates on Amazon Web Services

Source : http://techblog.netflix.com/2016/02/evolution-of-netflix-data-pipeline.html

Page 25: Systemof insight

Square Kilometer Array (SKA)

• Data collected in a single day take nearly two million years to playback on an MP3 player • Central computer has processing power of about one hundred million PCs.• SKA will use enough optical fiber linking up all the radio telescopes to wrap twice around the Earth.• Dishes of SKA when fully operational will produce 10 times the global internet traffic as of 2013.• Aperture arrays in the SKA could produce more than 100 times the global internet traffic as of 2013.• The SKA will generate enough raw data to fill 15 million 64 GB MP3 players every day.• The SKA supercomputer will perform 1018 operations per second - equivalent to the number of stars in three million Milky

Way galaxies - in order to process all the data that the SKA will produce.• So sensitive that it will be able to detect an airport radar on a planet 50 light years away.• Thousands of antennas with collecting area of about one square kilometer (that's 1,000,000 square meters).• Previous mapping of Centaurus A galaxy took a team 12,000 hours of observations or several years. SKA ETA 5 minutes ! • In first six hours of operation, SKA will generate more information than all previous radio telescopes • in the world combined.• The Square Kilometer Array will link 250,000 radio telescopes together, creating most sensitive telescope.

To the scientists involved, however, the SKA is no testbed, it’s a transformative instrument which, according to Luijten, will lead to “fundamental discoveries of how life and planets and matter all came into existence. As a scientist, this is a once in a lifetime opportunity.”

Sources: http://bit.ly/amazin-facts & http://bit.ly/astro-ska

Centaurus A

Page 26: Systemof insight

• Next generation radio telescope• 100 x more sensitive & 1,000,000 X faster • 5 square km of dish over 3000 km• Two sites: Western Australia & Karoo Desert RSA• Worlds most ambitious IT Project• First real exascale ready application• Largest global big-data challenge• SKA SDP exascale systems:

• 100,000 nodes • 800 cabinets • consume 20 MW

• Expected failure rates of 300 nodes per week

Square Kilometre Array http://www.ska.gov.au/

Page 27: Systemof insight

Caution!

“Children never put off till tomorrow what will keep them from going to bed tonight”

ADVERTISING AGE

Page 28: Systemof insight

8 Steps Towards Building the Data Centric Business

1. Put digital service (Vargo & Lusch) at centre of business blurring distinction with physical products via sensors and apps

2. Identify data and monetisation opportunities using business model canvas

3. Select unique sources of data to help drive innovation

4. Uses data to drive interactions and customer experiences

5. Understand the data lifecycle from creation to storage

6. Value extraction from data (economic or social)

7. Review patterns of big data businesses

8. Got on top of big data technology trends and analytics software