Beating Information Mess (without SOA)

Beating Information Mess(without SOA)

Adam Cooper, FSD STG Jan 25th 2010England and

Wales 2.0

2

Introduction

Examples and analysis from outside education

Based on published accounts

NOT my/our original work

Without SOA?

A heresy?

SOA is “a given” but It may not be optimal for all situations It may not be suited to dealing with problems quickly enough Modelling behaviour is harder than data

SOA stakes are high Need to “get it right” with service definition

SOA can be a struggle with some legacy architectures

… really its not “without” but “alongside SOA”

Choosing the right tool for the job

So what are we talking about?

Semantic Web

Semantic Web

Machine readable resources on the network Things and concepts identified by HTTP URI

And data about them accessed by it And linked together

Blends into ReST To maintain the data

We are really talking about “Web Architecture” Not necessarily the way techies/architects think.

Semantic Web

Concept-oriented descriptions

Query by meaning rather than fields

Extends into machine reasoning, AI territory (out of scope for this presentation and NOT a pre-requisite for value)

Broader Term: Semantic Technologies

Includes Natural Langauge Processing techniques such as Latent Semantic Analysis

These might produce Semantic Web resources

NOT in scope for this presentation.

Story 1: O’Reilly Media

http://oreilly.com/

“Any time someone had a new idea or a new product to launch that

didn’t quite fit into existing systems, we found some way to

shoehorn it in, with a quick Perl script or some clever custom

SQL. As we did this, more and more of our work became

preventing our systems from collapsing under the weight of

those one-off ETLs and scripts. The cost of simply keeping track

of which scripts were using what bit of transformed data and

where that data came from had became so high as to become

unsustainable. We'd accrued so much design debt that only the

most radical of approaches could save us from being crushed by

the weight of our inherited code.”

Quotes from Carothers & Greer (O’Reilly staff) taken from Talis Nodalities March/April 2009

We are not alone.(in good company?)

The O’Reilly Approach

Two Strands: Managing the products, the published books Managing metadata and its meaning

Driven by the problem Semantic Web was not the driver but the solution

“Today we have a Linked Data, Semantic, RESTful, URI-based …

solution mostly by accident and through ruthless pragmatism.

Instead of embracing the ideas of the Semantic Web at the

outset, we arrived at the Semantic Web because it was the only

solution.”

Strand 1: Managing the Product

Continuous stream of new books, errata, manifestations… Problematic to answer questions like:

Where is the latest definitive markup for book XXXX?

“…REST provides the tools to model and maintain ideas like ‘a canonical document’ and ‘synchronization over time.’”

Cf. Curriculum Management from initial design through validation to public offering

Curriculum Management:workflow, process, transformation, eventsServices

Resources

HTTP Get, Post, Put, Delete(maybe Atom/AtomPub)

Structured Curriculum Data

URI

Strand 2: Managing MetadataProblem 1: Answering questions no-one imagined would be asked.

“Can our PDFs have the same branding and colors as the printed books?”

- Marketing Person

“Sure! How hard can it be?”- Innocent Developer

Problem 2: Brittle, ad-hoc models

“Our definitive source of book and product information is the

Product Database (67,000+ lines of Perl, C++, SQL, and a

dozen other languages). The database and web application has

its own home-rolled “XML Format,” as I’m sure many other

companies have had. Based directly on the column names from

the SQL database, our Book XML was a quick and very dirty

way of getting our centralized relational data out into the world

as XML.”

The down-side of Web 2.0

Problem 3: Differing interpretations, subjective views

“… Publication Date for a book…The value was computed

independently by each of the ETL hydras... most people were

confident that one of five dates was the right date, but disagreed

on which of the five it was. Retail Availability Date, Actual In

Stock Date, Estimated In Stock Date, etc each had its backers.

What was really going on was that we discovered the subtle

different needs that each business unit had.”

Solution

Map the mess of concepts onto an idiosyncratic ontology “Product Database Legacy Ontology”

Move obvious concepts to public standard definitions (like Dublin Core)

Wait for real application pull before researching, defining, cleaning, and moving concepts into a modern, public ontology

Ontology Example

Ack: Chris Chamberlain, TEC NZ Govt

“Since Gavin’s first frenzied port of product metadata to an RDF

model, we’ve been able to negotiate changing requirements,

establish data validation and control rules, and bring on new

applications with little time spent on data modeling. In other

words, meeting our immediate need of a centralized, validated

data store of high agility and performance has paid off several

times over in deploying new software systems for the rapidly

changing company.”

A CIO’s strategy for rethinking "messy BI“(Story 2)

© PwCSpring 2009

“Messy BI”

Mainstream BI tools “weren’t intended to meet an increasingly

common need: to reuse the data in combination with other

internal and external information. Business users seek mashup

capabilities because they derive insights from such explorations

and analyses that internal, purpose-driven systems were never

designed to achieve. PwC calls this ‘messy BI’”

Is this Relevant?

Do our business users seek “mashup capabilities”?

Is the underlying BI need there? Internal info sources External info sources

Is the external data there?

Stereotypes

Conventional BI Data warehousing Specified reports Dashboard metrics Normalisation Single source of truth DB Column focus Taxonomies

What was understood last year

Semantic BI Distributed What-if scenarios Ad-hoc measures Contextual data Information mediation Concept focus Ontologies bridge

differences

What is understood today

Source: PwC, 2009

A Business Ontology…

Accepts and recognises diversity (within limits) NOT hiding diversity NOT forcing false equivalence

=> progress in some areas can be made without full up-front agreement

?What is the relationship between the different

understandings of a person and their attributes in each silo?

How do you manage this?

How much pain for a single source of truth?

Directory Service

FinanceHR

Student Records

Library

VLE

Access via mediation layer

Directory Service

Finance HR

Student RecordsLibrary

VLE

Ontology

Mapping Recognises Context

Does departmental student satisfaction correlate with publications per staff member?

Access via mediation layer

Institutional Repository

HRStudent Records

National Student Survey

Ontology

Mapping Recognises Context

“Applying the Linked Data approach complements architectural

approaches such as service-oriented architecture (SOA), inline

operational analytics, and event driven architectures that allow

various functions to interact as needed… within the specified

bounds [of the ontology].”

“PwC recommends CIOs begin to rethink their information strategy

with Linked Data in mind… not… a big-bang initiative”

34

References/Reading

Linking Data and Semantics at O’Reilly,Talis Nodalities Issue 6 http://www.talis.com/nodalities/pdf/nodalities_issue6.pdf

A CIOs Strategy for “Messy BI”, PwC Tech Forecast http://www.pwc.com/us/en/technology-forecast/spring2009/index.jhtml

See also: Financial Industries,Talis Nodalities Issue 3 http://www.talis.com/nodalities/pdf/nodalities_issue3.pdf

Feel free to contact me: [email protected]

Beating Information Mess (without SOA)

Documents

problem semantic web

data validation

linked data

semantic web resources

data modeling

transformed data

data soa stakes

new product