Quality Taxonomies

Post on 10-Feb-2016

23 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Quality Taxonomies. Dr. Claude Vogel Founder & CTO KM World 2000. Ontology / Taxonomy. Static Discovery. Root Ontology. Taxonomy Generation. Dynamic Discovery. What is Quality ?. “Best value for the money” - PowerPoint PPT Presentation

Transcript

Quality Taxonomies

Dr. Claude VogelFounder & CTO

KM World 2000

Ontology / Taxonomy

Root Ontology

Taxonomy Generation

Static Discovery

Dynamic Discovery

What is Quality ? “Best value for the money” According to this definition, you are entitled to

get high performance from a costly product; likewise a low cost product or service is expected to be a poor delivery. For example, a loose demo delivery is both predictable and acceptable, since its quality is: low conformance / low cost.

What is Quality ?

“Good Quality is Nominal Conformance” Taxonomy Quality is defined as Taxonomy

Conformance to: – Valid requirements;– Explicitly documented development standards; and, – Implicit characteristics that are expected of all

professionally developed taxonomies, such as the desire for good maintainability.

Standards ISO 2788-1986

– International Organization for Standardization. Documentation—Guidelines for the Establishment and Development of Monolingual Thesauri. 2nd ed. n.p.: ISO, 1986. (ISO 2788-1986(E)). (Available in the U.S. from American National Standards Institute)

ISO 5964-1985 – International Organization for Standardization. Documentation—Guidelines for the Establishment

and Development of Multilingual Thesauri. n.p.: ISO, 1985. (ISO 5964-1985(E)). (Available in the U.S. from American National Standards Institute)

ANSI/NISO Z39.19-1993– National Information Standards Institute. Guidelines for the Construction, Format, and Management

of Monolingual Thesauri. Bethesda, MD: NISO Press, 1994. 69p. (ANSI/NISO Z39.19-1993)

SEMIO Quality Plan v1 2000 ISO/IEC 13250 Topic Maps RDF

– Please refer to RDF at http://www.w3.org/RDF and XML at http://www/w3/org/XML

Project Plan

1. Kick-off2. Requirements Review3. Lexicon Review4. Taxonomy Review5. Tags Review6. Final Review

1. Kick-off Objectives

– Purpose– Scope– Scale– Users– Conditions of receipt

Roles– Supplier– Customer

• Admin• KE• Experts• Users

Planning Training and Transfer

2. Requirements Review

Sources Lexicon Ontology Install

Sources

Dispersion (Multiplicity, Size, Homogeneity) Refresh Access

Features Internet, News, E-Mail

Reports, Patents

E-Trade, Logs

Informative content - + + Number of topics covered + + - Structured information - + + Size of records - + - Number of records + - +

Typical Patterns Disparity

Adjust sources Adjust crawl strategy Isolate communities / taxonomies

Lexicon

Vocabularies, etc. Substitutions: Acronyms, Synonyms, etc. Preferred Keywords: Brand Names, etc. Banned Keywords

Typical Patterns Lack of requirements

Use Librarian Resources

Ontology

Thesaurus ? Is the information domain analysis complete,

consistent, and accurate ? Is the partitioning of the problem complete ?

Typical Patterns Directory versus Taxonomy

Isolate “directory” branches Thesaurus versus Taxonomy

Put an ontology on top of thesaurus Check ASAP match of thesaurus generics with extracted

lexicon Very high level design for top categories

requirements Plan to work bottom-up

See also Taxonomy (functions, combinations, etc.)

Install

Implementation / Integration:– Are external and internal interfaces properly

defined? – Are all requirements traceable to the system level? – Has prototyping been conducted for the

user/customer? – Is performance achievable within the constraints

imposed by other system elements? – Are requirements consistent with schedule,

resources, and budget?

Typical Patterns Scale Security Missing Documents

3. Lexicon Review Coverage

– Extracted words / Words– (Extracted Index / Index)

Sources bench-marking– Coverage– Extraction quality– Topic distribution

Structure– Most Frequent Phrases– Most Productive Generics

Substitutions Exceptions

Typical Patterns Low level of frequency / quality for the

most meaningful content Increase size of value corpus Filter and re-import lexicon

4. Taxonomy Review Taxonomy Operation

– Correctness– Reliability– Usability– Integrity– Efficiency

Taxonomy Revision– Maintainability– Flexibility– Testability

Taxonomy Transition– Portability– Reusability– Interoperability

Tax

Liability

Loan

Term loan

Short-term loan

Unique Beginner

Life Form

Generic

Specific

Varietal

Folk Taxonomies Design

The Berlin and Kay model: Taxonomy = Nomenclature + Terminology

Correctness Accuracy Completeness Consistency

Accuracy

PrecisionRecall

Completeness

Taxonomy Maps Lexicon Collection

Concentration Works Against Quality

Lexicon

Document Collection

Maps

Taxonomy

Tagging

Tagging Coverage Ontology Coverage Hook Coverage Map Coverage Lexical Coverage Collection Coverage

Consistency:Typical Patterns

Objectivization Hyperonymy Speciation Necessity

Objectivization

EmploymentFiringHiring

Salaries

Avoid functional categories Don’t mix functions / objects Exhaust scripts Match idiomatic phrases

Genericity

PartsAir ConditioningBelts and HosesBodyBrake SystemChassisEngineExhaust SystemFuel SystemGlassIgnition

Avoid meronymy Don’t mix meronymy /

hyperonymy Exhaust prototypes

Speciation

Person Unwelcome personUnpleasant personSelfish personOpportunistBackscratcher

Avoid “strings” of categories Avoid (non-idioms) properties for categories

(WordNet)

Necessity

Tax

Individuals Corporations

Assets Liability Assets Liability

B C

D

E

FG

H

I

K

Tax

Individuals Corporations

Assets Liability

Individuals Corporations

Avoid non-productive categories

Avoid combinations of categories

Nomenclature (Design Structure) Quality Index

UB

i j

lf lflf1 2 g g gn 1 2 i

n3 4 mg g g g g g s s s s s s25 6 1 3 4

s s s s5 6 7 8

v v1 2

•Level 0

•Level 1

•Level 2

•Level 3

•Level 4

UB = unique beginner lf = life-form g = generic s = specific v = varietal

Width

Depth

Balance

Complexity Index Cyclometric complexity increases with number of

Cross References within the Taxonomy, giving an indication of complexity and difficulty of testing.

Taxonomy Complexity Index combines:– autonomy– closure– similarity– typicality– commonality– redundancy– stability

Maturity index The IEEE standard 982.1-1988 suggests a taxonomy

maturity index to provide an indication of the stability of the taxonomy .

Maturity Index combines:– number of modules in current ontology / taxonomy.– number of modules in current ontology / taxonomy that have

been changed.– number of modules added to current ontology / taxonomy. – number of modules deleted from the previous version of the

ontology / taxonomy.

5. Tags Review Document coverage Concepts coverage

<tagset> <document> <docurl>http://www.TaxSource.com</docurl> <tag> <tagname>Liability</tagname> <weight>1.289</weight> </tag> <tag> <tagname>Federal Funds</tagname> <weight>0.746</weight> </tag> </document></tagset>

6. Final Review Receipt Maintenance

Quality Taxonomies

Claude Vogelcvogel@semio.com

KM World 2000

top related