Beating Information Mess (without SOA) Adam Cooper, FSD STG Jan 25 th 2010 England and Wales 2.0
Jun 16, 2015
Beating Information Mess(without SOA)
Adam Cooper, FSD STG Jan 25th 2010England and
Wales 2.0
2
Introduction
Examples and analysis from outside education
Based on published accounts
NOT my/our original work
Without SOA?
A heresy?
SOA is “a given” but It may not be optimal for all situations It may not be suited to dealing with problems quickly enough Modelling behaviour is harder than data
SOA stakes are high Need to “get it right” with service definition
SOA can be a struggle with some legacy architectures
… really its not “without” but “alongside SOA”
Choosing the right tool for the job
So what are we talking about?
Semantic Web
Semantic Web
Machine readable resources on the network Things and concepts identified by HTTP URI
And data about them accessed by it And linked together
Blends into ReST To maintain the data
We are really talking about “Web Architecture” Not necessarily the way techies/architects think.
Semantic Web
Concept-oriented descriptions
Query by meaning rather than fields
Extends into machine reasoning, AI territory (out of scope for this presentation and NOT a pre-requisite for value)
Broader Term: Semantic Technologies
Includes Natural Langauge Processing techniques such as Latent Semantic Analysis
These might produce Semantic Web resources
NOT in scope for this presentation.
Story 1: O’Reilly Media
http://oreilly.com/
“Any time someone had a new idea or a new product to launch that
didn’t quite fit into existing systems, we found some way to
shoehorn it in, with a quick Perl script or some clever custom
SQL. As we did this, more and more of our work became
preventing our systems from collapsing under the weight of
those one-off ETLs and scripts. The cost of simply keeping track
of which scripts were using what bit of transformed data and
where that data came from had became so high as to become
unsustainable. We'd accrued so much design debt that only the
most radical of approaches could save us from being crushed by
the weight of our inherited code.”
Quotes from Carothers & Greer (O’Reilly staff) taken from Talis Nodalities March/April 2009
We are not alone.(in good company?)
The O’Reilly Approach
Two Strands: Managing the products, the published books Managing metadata and its meaning
Driven by the problem Semantic Web was not the driver but the solution
“Today we have a Linked Data, Semantic, RESTful, URI-based …
solution mostly by accident and through ruthless pragmatism.
Instead of embracing the ideas of the Semantic Web at the
outset, we arrived at the Semantic Web because it was the only
solution.”
Strand 1: Managing the Product
Continuous stream of new books, errata, manifestations… Problematic to answer questions like:
Where is the latest definitive markup for book XXXX?
“…REST provides the tools to model and maintain ideas like ‘a canonical document’ and ‘synchronization over time.’”
Cf. Curriculum Management from initial design through validation to public offering
Curriculum Management:workflow, process, transformation, eventsServices
Resources
HTTP Get, Post, Put, Delete(maybe Atom/AtomPub)
Structured Curriculum Data
URI
Strand 2: Managing MetadataProblem 1: Answering questions no-one imagined would be asked.
“Can our PDFs have the same branding and colors as the printed books?”
- Marketing Person
“Sure! How hard can it be?”- Innocent Developer
Problem 2: Brittle, ad-hoc models
“Our definitive source of book and product information is the
Product Database (67,000+ lines of Perl, C++, SQL, and a
dozen other languages). The database and web application has
its own home-rolled “XML Format,” as I’m sure many other
companies have had. Based directly on the column names from
the SQL database, our Book XML was a quick and very dirty
way of getting our centralized relational data out into the world
as XML.”
The down-side of Web 2.0
Problem 3: Differing interpretations, subjective views
“… Publication Date for a book…The value was computed
independently by each of the ETL hydras... most people were
confident that one of five dates was the right date, but disagreed
on which of the five it was. Retail Availability Date, Actual In
Stock Date, Estimated In Stock Date, etc each had its backers.
What was really going on was that we discovered the subtle
different needs that each business unit had.”
Solution
Map the mess of concepts onto an idiosyncratic ontology “Product Database Legacy Ontology”
Move obvious concepts to public standard definitions (like Dublin Core)
Wait for real application pull before researching, defining, cleaning, and moving concepts into a modern, public ontology
Ontology Example
Ack: Chris Chamberlain, TEC NZ Govt
“Since Gavin’s first frenzied port of product metadata to an RDF
model, we’ve been able to negotiate changing requirements,
establish data validation and control rules, and bring on new
applications with little time spent on data modeling. In other
words, meeting our immediate need of a centralized, validated
data store of high agility and performance has paid off several
times over in deploying new software systems for the rapidly
changing company.”
A CIO’s strategy for rethinking "messy BI“(Story 2)
© PwCSpring 2009
“Messy BI”
Mainstream BI tools “weren’t intended to meet an increasingly
common need: to reuse the data in combination with other
internal and external information. Business users seek mashup
capabilities because they derive insights from such explorations
and analyses that internal, purpose-driven systems were never
designed to achieve. PwC calls this ‘messy BI’”
Is this Relevant?
Do our business users seek “mashup capabilities”?
Is the underlying BI need there? Internal info sources External info sources
Is the external data there?
Stereotypes
Conventional BI Data warehousing Specified reports Dashboard metrics Normalisation Single source of truth DB Column focus Taxonomies
What was understood last year
Semantic BI Distributed What-if scenarios Ad-hoc measures Contextual data Information mediation Concept focus Ontologies bridge
differences
What is understood today
Source: PwC, 2009
A Business Ontology…
Accepts and recognises diversity (within limits) NOT hiding diversity NOT forcing false equivalence
=> progress in some areas can be made without full up-front agreement
?What is the relationship between the different
understandings of a person and their attributes in each silo?
How do you manage this?
How much pain for a single source of truth?
Directory Service
FinanceHR
Student Records
Library
VLE
Access via mediation layer
Directory Service
Finance HR
Student RecordsLibrary
VLE
Ontology
Mapping Recognises Context
Does departmental student satisfaction correlate with publications per staff member?
Access via mediation layer
Institutional Repository
HRStudent Records
National Student Survey
Ontology
Mapping Recognises Context
“Applying the Linked Data approach complements architectural
approaches such as service-oriented architecture (SOA), inline
operational analytics, and event driven architectures that allow
various functions to interact as needed… within the specified
bounds [of the ontology].”
“PwC recommends CIOs begin to rethink their information strategy
with Linked Data in mind… not… a big-bang initiative”
34
References/Reading
Linking Data and Semantics at O’Reilly,Talis Nodalities Issue 6 http://www.talis.com/nodalities/pdf/nodalities_issue6.pdf
A CIOs Strategy for “Messy BI”, PwC Tech Forecast http://www.pwc.com/us/en/technology-forecast/spring2009/index.jhtml
See also: Financial Industries,Talis Nodalities Issue 3 http://www.talis.com/nodalities/pdf/nodalities_issue3.pdf
Feel free to contact me: [email protected]