Top Banner
Beating Information Mess (without SOA) Adam Cooper, FSD STG Jan 25 th 2010 England and Wales 2.0
34
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Beating Information Mess (without SOA)

Beating Information Mess(without SOA)

Adam Cooper, FSD STG Jan 25th 2010England and

Wales 2.0

Page 2: Beating Information Mess (without SOA)

2

Introduction

Examples and analysis from outside education

Based on published accounts

NOT my/our original work

Page 3: Beating Information Mess (without SOA)

Without SOA?

A heresy?

SOA is “a given” but It may not be optimal for all situations It may not be suited to dealing with problems quickly enough Modelling behaviour is harder than data

SOA stakes are high Need to “get it right” with service definition

SOA can be a struggle with some legacy architectures

… really its not “without” but “alongside SOA”

Page 4: Beating Information Mess (without SOA)

Choosing the right tool for the job

Page 5: Beating Information Mess (without SOA)

So what are we talking about?

Page 6: Beating Information Mess (without SOA)

Semantic Web

Page 7: Beating Information Mess (without SOA)

Semantic Web

Machine readable resources on the network Things and concepts identified by HTTP URI

And data about them accessed by it And linked together

Blends into ReST To maintain the data

We are really talking about “Web Architecture” Not necessarily the way techies/architects think.

Page 8: Beating Information Mess (without SOA)

Semantic Web

Concept-oriented descriptions

Query by meaning rather than fields

Extends into machine reasoning, AI territory (out of scope for this presentation and NOT a pre-requisite for value)

Page 9: Beating Information Mess (without SOA)

Broader Term: Semantic Technologies

Includes Natural Langauge Processing techniques such as Latent Semantic Analysis

These might produce Semantic Web resources

NOT in scope for this presentation.

Page 10: Beating Information Mess (without SOA)

Story 1: O’Reilly Media

http://oreilly.com/

Page 11: Beating Information Mess (without SOA)
Page 12: Beating Information Mess (without SOA)

“Any time someone had a new idea or a new product to launch that

didn’t quite fit into existing systems, we found some way to

shoehorn it in, with a quick Perl script or some clever custom

SQL. As we did this, more and more of our work became

preventing our systems from collapsing under the weight of

those one-off ETLs and scripts. The cost of simply keeping track

of which scripts were using what bit of transformed data and

where that data came from had became so high as to become

unsustainable. We'd accrued so much design debt that only the

most radical of approaches could save us from being crushed by

the weight of our inherited code.”

Quotes from Carothers & Greer (O’Reilly staff) taken from Talis Nodalities March/April 2009

Page 13: Beating Information Mess (without SOA)

We are not alone.(in good company?)

Page 14: Beating Information Mess (without SOA)

The O’Reilly Approach

Two Strands: Managing the products, the published books Managing metadata and its meaning

Driven by the problem Semantic Web was not the driver but the solution

“Today we have a Linked Data, Semantic, RESTful, URI-based …

solution mostly by accident and through ruthless pragmatism.

Instead of embracing the ideas of the Semantic Web at the

outset, we arrived at the Semantic Web because it was the only

solution.”

Page 15: Beating Information Mess (without SOA)

Strand 1: Managing the Product

Continuous stream of new books, errata, manifestations… Problematic to answer questions like:

Where is the latest definitive markup for book XXXX?

“…REST provides the tools to model and maintain ideas like ‘a canonical document’ and ‘synchronization over time.’”

Cf. Curriculum Management from initial design through validation to public offering

Page 16: Beating Information Mess (without SOA)

Curriculum Management:workflow, process, transformation, eventsServices

Resources

HTTP Get, Post, Put, Delete(maybe Atom/AtomPub)

Structured Curriculum Data

URI

Page 17: Beating Information Mess (without SOA)

Strand 2: Managing MetadataProblem 1: Answering questions no-one imagined would be asked.

“Can our PDFs have the same branding and colors as the printed books?”

- Marketing Person

“Sure! How hard can it be?”- Innocent Developer

Page 18: Beating Information Mess (without SOA)

Problem 2: Brittle, ad-hoc models

“Our definitive source of book and product information is the

Product Database (67,000+ lines of Perl, C++, SQL, and a

dozen other languages). The database and web application has

its own home-rolled “XML Format,” as I’m sure many other

companies have had. Based directly on the column names from

the SQL database, our Book XML was a quick and very dirty

way of getting our centralized relational data out into the world

as XML.”

The down-side of Web 2.0

Page 19: Beating Information Mess (without SOA)

Problem 3: Differing interpretations, subjective views

“… Publication Date for a book…The value was computed

independently by each of the ETL hydras... most people were

confident that one of five dates was the right date, but disagreed

on which of the five it was. Retail Availability Date, Actual In

Stock Date, Estimated In Stock Date, etc each had its backers.

What was really going on was that we discovered the subtle

different needs that each business unit had.”

Page 20: Beating Information Mess (without SOA)
Page 21: Beating Information Mess (without SOA)

Solution

Map the mess of concepts onto an idiosyncratic ontology “Product Database Legacy Ontology”

Move obvious concepts to public standard definitions (like Dublin Core)

Wait for real application pull before researching, defining, cleaning, and moving concepts into a modern, public ontology

Page 22: Beating Information Mess (without SOA)

Ontology Example

Ack: Chris Chamberlain, TEC NZ Govt

Page 23: Beating Information Mess (without SOA)

“Since Gavin’s first frenzied port of product metadata to an RDF

model, we’ve been able to negotiate changing requirements,

establish data validation and control rules, and bring on new

applications with little time spent on data modeling. In other

words, meeting our immediate need of a centralized, validated

data store of high agility and performance has paid off several

times over in deploying new software systems for the rapidly

changing company.”

Page 24: Beating Information Mess (without SOA)

A CIO’s strategy for rethinking "messy BI“(Story 2)

© PwCSpring 2009

Page 25: Beating Information Mess (without SOA)

“Messy BI”

Mainstream BI tools “weren’t intended to meet an increasingly

common need: to reuse the data in combination with other

internal and external information. Business users seek mashup

capabilities because they derive insights from such explorations

and analyses that internal, purpose-driven systems were never

designed to achieve. PwC calls this ‘messy BI’”

Page 26: Beating Information Mess (without SOA)

Is this Relevant?

Do our business users seek “mashup capabilities”?

Is the underlying BI need there? Internal info sources External info sources

Is the external data there?

Page 27: Beating Information Mess (without SOA)

Stereotypes

Conventional BI Data warehousing Specified reports Dashboard metrics Normalisation Single source of truth DB Column focus Taxonomies

What was understood last year

Semantic BI Distributed What-if scenarios Ad-hoc measures Contextual data Information mediation Concept focus Ontologies bridge

differences

What is understood today

Page 28: Beating Information Mess (without SOA)

Source: PwC, 2009

Page 29: Beating Information Mess (without SOA)

A Business Ontology…

Accepts and recognises diversity (within limits) NOT hiding diversity NOT forcing false equivalence

=> progress in some areas can be made without full up-front agreement

Page 30: Beating Information Mess (without SOA)

?What is the relationship between the different

understandings of a person and their attributes in each silo?

How do you manage this?

How much pain for a single source of truth?

Directory Service

FinanceHR

Student Records

Library

VLE

Page 31: Beating Information Mess (without SOA)

Access via mediation layer

Directory Service

Finance HR

Student RecordsLibrary

VLE

Ontology

Mapping Recognises Context

Page 32: Beating Information Mess (without SOA)

Does departmental student satisfaction correlate with publications per staff member?

Access via mediation layer

Institutional Repository

HRStudent Records

National Student Survey

Ontology

Mapping Recognises Context

Page 33: Beating Information Mess (without SOA)

“Applying the Linked Data approach complements architectural

approaches such as service-oriented architecture (SOA), inline

operational analytics, and event driven architectures that allow

various functions to interact as needed… within the specified

bounds [of the ontology].”

“PwC recommends CIOs begin to rethink their information strategy

with Linked Data in mind… not… a big-bang initiative”

Page 34: Beating Information Mess (without SOA)

34

References/Reading

Linking Data and Semantics at O’Reilly,Talis Nodalities Issue 6 http://www.talis.com/nodalities/pdf/nodalities_issue6.pdf

A CIOs Strategy for “Messy BI”, PwC Tech Forecast http://www.pwc.com/us/en/technology-forecast/spring2009/index.jhtml

See also: Financial Industries,Talis Nodalities Issue 3 http://www.talis.com/nodalities/pdf/nodalities_issue3.pdf

Feel free to contact me: [email protected]