© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024 1
CERIF COURSESession5:
ImplementationKeith G Jeffery, Director, IT CLRC [email protected]
Anne Asserson, University of Bergen [email protected]
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024 2
Structure
• Language handling• Classification schemes and enumerated
lists, dictionaries, thesauri • Domain ontologies• Versions for Relational DBMS• Mapping CERIF to RDBMS• OODBMS and IR systems• Keys• Surrogate keys in linking relations
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024 3
Structure
• Language handling• Classification schemes and enumerated
lists, dictionaries, thesauri • Domain ontologies• Versions for Relational DBMS• Mapping CERIF to RDBMS• OODBMS and IR systems• Keys• Surrogate keys in linking relations
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024 4
Language handling Character / Language
Variants• Character sets
– Not only ‘Latin-1’– Can use escape codes technique but
only works in linear data streams– Better to use a rich code that can
handle any character from any language (including mathematics, financial currencies) - Unicode
– But it requires more storage
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024 5
Language handling Character / Language
Variants• Language• CERIF has many text fields• Each field may exist in multiple languages• For retrieval or update need to know the
language (for text-matching)• So have within the logical record multiple sub-
records differentiated by language for each text field
• Example: Project.Abstract will usually exist in (US) English and original language and maybe language of country/region where stored
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024 6
Equip Name
Relation_254
Relation_253
Serv ice Description
Serv ice Name
Ev ent Description
Ev ent Name
Per Res Interest
Equip Desc
Exp Skill Name
Exp Skill Desc
Pers Class
Pers Facility
Pers Equip
Per Serv ice
Pers Pub
Per Product
Per Patent
Pers Con
Per Exp Skill
Ev ent Per
CV Per
Person
Per IdPer Family NamesPer First NamesPer Other NamesPer SexPer Academic TitlePer QualificationsPer NationalitiesPer Prize AwardPer URI
CVCV IdCV URI
EventEvent IdEvent TypeEvent LocationEvent Start DateEvent End DateEvent Fee or FreeEvent URL
ContactCon IdCon Addrline1Con Addrline2Con Addrline3Con AddrLine4Con Addrline5Con City TownCon Province StateCon Postal codeCon TelephoneCon FaxCon EmailCon URI
ServiceService IdService URI
Expertise Skills
Exp Skill Id
Particular Equipment
Equip IdEquip Ow Inv IdEquip OEM Id
Result Publication
Pub IdPub DatePub ReferencePub URIPub URI Type
Result PatentPatent IdPatent NumberPatent CountriesPatent Reg DatePatent Approval datePatent URI
General FacilityFacility IDFacility URL
ClassificationClassClass codeClass URL
Result ProductProd IdProd Int IdProd URI
Level 3 Person Entity
Expertise Skills Name
Exp Skill Name LanguageExp Skill Name Trans TypeExp Skill Name
Service NameService Name LanguageService Name Trans TypeService Name
Equipment DescriptionEquip Desc LanguageEquip Desc Trans TypeEquip Description
Expertise Skills DescriptionExp Skill Desc LanguageExp Skill desc Trans TypeExp Skill Description
Service Description
Service Dersc LanguageSERVICE_TRANS_TYPEService Description
Research Interest
Res Int LanguageRes Int Trans TypeRes Int Keywords
Event NameEvent LangaugeEvent Trans TypeEvent Name
Event Description
Event Desc LanguageEvent Desc Trans TypeEvent Description
Patent Title
Patent Title LanguagePatent Title Trans TypePatent Title
Patent Abstract
Patent Abs LanguagePatent Abs Trans TypePatent Abstract
Equipment Name
Equip Name LanguageEquip Name Trans TypeEquip Name
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024 7
Language handling Example:
Person_Research_Interests
Translation char(1) m, pk(part) o(rig), h(uman), m(achine)
Person-Research interest
Person Id char(32) m, pk(part)
Language char(2) m, pk(part)
Keywords char(1024)
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024 8
Structure
• Language handling• Classification schemes and enumerated
lists, dictionaries, thesauri • Domain ontologies• Versions for Relational DBMS• Mapping CERIF to RDBMS• OODBMS and IR systems• Keys• Surrogate keys in linking relations
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024 9
Classification schemes …… Enumerated Lists,
Dictionaries, Thesauri, Ontologies
• Purpose– Higher quality data: data validation– More accurate retrieval: query
keywords limited and stored words (for any attribute) limited
– Classification – allowing grouping and ranking by value of attribute
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
10
Classification schemes …… Enumerated List
• Example: Country Code• There is an ISO standard list of valid
2-character and 3-character country codes
• On input can validate country code is from this list (commonly with a pull-down)
• If changes in countries, update the list in one place and whole system reconfigured
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
11
Reg Con
Country Contact
Prod Ty pe
Facility Ty pe
Patent Ty pe
Patent Status
Pub Ty pe
Per Honor Title
Equip Ty pe
Per LanguagePer Qualif ication
Per Academ Title
CV Multimedia
Pers Class
Pers Facility
Pers Equip
Per Serv ice
Pers Pub
Per Product
Per Patent
Pers Con
Per Exp Skill
Ev ent Per
CV Per
Person
Per IdPer Family NamesPer First NamesPer Other NamesPer SexPer Academic TitlePer QualificationsPer NationalitiesPer Prize AwardPer URI
CV
CV IdCV URI
Event
Event IdEvent TypeEvent LocationEvent Start DateEvent End DateEvent Fee or FreeEvent URL
ContactCon IdCon Addrline1Con Addrline2Con Addrline3Con AddrLine4Con Addrline5Con City TownCon Province StateCon Postal codeCon EmailCon FaxCon TelephoneCon URI
ServiceService IdService URI
Expertise Skills
Exp Skill Id
Particular EquipmentEquip IdEquip Ow Inv IdEquip OEM Id
Result Publication
Pub IdPub DatePub ReferencePub URIPub URI Type
Result PatentPatent IdPatent NumberPatent CountriesPatent Reg DatePatent Approval datePatent URI
General Facility
Facility IDFacility URL
Classification
ClassClass codeClass URL
Result Product
Prod IdProd Int IdProd URI
Level 4 Person Entity
Honorofic TitleHonorific TitleHonorific Title Full
Academic Title
ACADEMIC_TITLEAcademic Title Full
Qualification
QualificationQualification Full
LanguageLang CodeLang Full NameLang English Name
Multimedia TypeMultimedia TypeMultimedia Type FullMultimedia Format
Equipment Type
Equip TypeEquip Type Full
Publication Type
Pub TypePub Type FullPatent Status
Patent StatusPatent Status Full
Patent TypePatent TypePatent Type Full
Facility Type
FACILITY_TYPEFacility Type Full
Product Type
PROD_TYPEProd Type Full
Country
Country CodeCountry NameCountry Name English
NUTS Region
Region codeRegion NameRegion Name English
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
12
Classification schemes …… Lookup Table
Examples: Person-relatedPerson-honorific title
Honorific Title char (4) m, pk(part) Sir, Lady…
Person-honorific title
Title full char(32) m, pk(part)
Person-academic title
Academic Title char (4) m, pk(part) Dr, Prof…
Person-academic title
Title full char(32) m, pk(part)
Person-qualification
Qualification char(4) m, pk(part)
Person-qualification
Qualification full
char(32) m, pk(part) ISCED (UNESCO International Classification of Education, level of education)
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
13
CON_ID = CON_ID
LANG_CODE = LANG_CODE
PER_ID = PER_ID
CV_ID = CV_ID
MULTIMEDIA_TYPE = MULTIMEDIA_TYPE
EXP_SKILL_ID = EXP_SKILL_ID
PER_ID = PER_ID
PER_ID = PER_ID
CV_ID = CV_ID
PER_ID = PER_ID
PER_ID = PER_ID
QUALIFICATION = QUALIFICATION
PER_ID = PER_ID
ACADEMIC_TITLE = ACADEMIC_TITLE
PER_ID = PER_ID
HONORIFIC_TITLE = HONORIFIC_TITLE
PERSON
PER_ID CHAR(32)PER_FAMILY_NAMES VARCHAR2(32)PER_FIRST_NAMES VARCHAR2(16)PER_OTHER_NAMES VARCHAR2(32)PER_SEX CHAR(1)PER_ACADEMIC_TITLE VARCHAR2(8)PER_QUALIFICATIONS VARCHAR2(24)PER_NATIONALITIES VARCHAR2(16)PER_PRIZE_AWARD VARCHAR2(64)PER_URI VARCHAR2(512)
EXPERTISE_SKILLS
EXP_SKILL_ID CHAR(32)
CV
CV_ID VARCHAR2(32)CV_URI VARCHAR2(512)
HONOROFIC_TITLE
HONORIFIC_TITLE CHAR(4)HONORIFIC_TITLE_FULL VARCHAR2(32)
ACADEMIC_TITLE
ACADEMIC_TITLE CHAR(4)ACADEMIC_TITLE_FULL VARCHAR2(32)
QUALIFICATIONQUALIFICATION CHAR(4)QUALIFICATION_FULL VARCHAR2(32)
LANGUAGE
LANG_CODE CHAR(2)LANG_FULL_NAME VARCHAR2(32)LANG_ENGLISH_NAME VARCHAR2(32)
MULTIMEDIA_TYPE
MULTIMEDIA_TYPE VARCHAR2(16)MULTIMEDIA_TYPE_FULL VARCHAR2(64)MULTIMEDIA_FORMAT VARCHAR2(32)
HONORIFIC_TITLE_PER
HONORIFIC_TITLE CHAR(4)PER_ID CHAR(32)
ACADEMIC_TITLE_PER
ACADEMIC_TITLE CHAR(4)PER_ID CHAR(32)
QUAL_PERSON
QUALIFICATION CHAR(4)PER_ID CHAR(32)
PERS_CONTACT
PER_ID CHAR(32)CON_ID CHAR(32)PER_CON_ROLE VARCHAR2(16)PER_CON_START DATEPER_CON_END DATE
PERS_CV
CV_ID CHAR(32)PER_ID CHAR(32)
PERS_EXPERT_SKILL
PER_ID CHAR(32)EXP_SKILL_ID CHAR(32)PER_EXP_ROLE VARCHAR2(16)PER_EXP_CONDITIONS VARCHAR2(1024)PER_EXP_PRICE NUMBER(12,2)PER_EXP_CURRENCY CHAR(3)PER_EXP_AVAILABILITY VARCHAR(64)PER_EXP_START DATEPER_EXP_END DATE
CV_MULTIMEDIA_TYPE
MULTIMEDIA_TYPE VARCHAR2(16)CV_ID CHAR(32)
PERS_LANG
PER_ID CHAR(32)LANG_CODE CHAR(2)SKILL_READING CHAR(1)SKILL_WRITING CHAR(1)SKILL_SPEAKING CHAR(1)
CONTACT
CON_ID CHAR(32)COUNTRY_CODE VARCHAR2(4)REGION_CODE CHAR(10)CON_ADDRLINE1 VARCHAR2(80)CON_ADDRLINE2 VARCHAR2(80)CON_ADDRLINE3 VARCHAR2(80)CON_ADDRLINE4 VARCHAR2(80)CON_ADDRLINE5 VARCHAR2(80)CON_CITY_TOWN VARCHAR2(32)CON_PROVINCE_STATE VARCHAR2(32)CON_POSTAL_CODE VARCHAR2(16)CON_TELEPHONE VARCHAR2(24)CON_FAX VARCHAR2(24)CON_EMAIL VARCHAR2(32)CON_URI VARCHAR2(512)
Level 5 Person Characteristics
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
14
Classification schemes …… Other Lookup Tables in
CERIF• 1.1.1 Language• The ISO 639 standard, two-letter code should be used.• This standard is applied to the entity Language in the CERIF 2000
data model.• 1.1.2 Country• The ISO 3166 standard, two-letter code should be used. • This standard is applied to entity Country in the CERIF 2000 data
model.• 1.1.3 Currency• The three-letter ISO 4217 code (SWIFT code) should be used. • This is applied to the currency values in the CERIF 2000 data
model.• 1.1.4 Address• The NUTS (territorial units EU) codes should be used for regions
in the EU. • This is applied to entity NUTS-Region in the CERIF 2000 data
model.
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
15
Classification schemes …… Other Lookup Tables in
CERIF
• 1.1.5 Role of a person in an organization• The ISCO[1] should be used. • This is applied to the role attribute in the Person-
OrgUnit entity of the CERIF 2000 data model. • 1.1.6 Qualification of a person• The ISCED[2] should be used. • This is applied to the entity Qualification of the CERIF
2000 data model.
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
16
Classification schemes …… Other Lookup Tables in
CERIF
• 1.1.7 Organization/Company size• The values from the harmonised information package for Fifth
Framework Programme should be used. S1 : 0 S2 : 1-9 S3a : 10-49 S3b : 50-249 S4 : 250-499 S5 : 500-1999 S6 : 2000+ employees• Note: An SME (small and medium-sized enterprise) is defined as an
entity that has less than 250 full time equivalent employees, and has an annual turnover not exceeding EUR 40 million, or an annual balance sheet total not exceeding EUR 27 million, and is not owned by 25% or more by a non-SME.
• These values are applied to entity attribute Org_Headcount in entity OrgUnit in CERIF 2000 data model.
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
17
Classification schemes …… Other Lookup Tables in
CERIF
• 1.1.8 Type and status of a patent• The IPC[3] should be used. • This is applied to Entity Patent Type in CERIF 2000 data model. • 1.1.9 Type of publication• The UNIMARC Manual – Bibliographic Format 1994 and the updates
from 1996 and 1998 should be used. • This is applied to entity Publication Type in CERIF 2000 data model. • 1.1.10 Type of event • The following list of values should be used: Conference Cultural event Exhibition Political event Sport event Trade fair Workshop• This is applied to the entity Event Type of the CERIF 2000 data model.
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
18
Classification schemes …… Other Lookup Tables in
CERIF
• 1.1.11 Type of multimedia item• Use the UNIMARC Manual – Bibliographic Format 1994 (IFLA-
UBCIM), and updates from 1996 and 1998 (fields 115, 125-128) (IFLA-UBCOM).
• This is applied to entity Multimedia type in CERIF 2000 data model.
• 1.1.12 Role of an organization in a project• The following definitions from the harmonised information
package for Fifth Framework programme should be used: CO: Co-ordinator (=scientific, administrative and financial
co-ordinator) CF: Only financial co-ordinator (if different from co-
ordinator) AC: Associate contractor CR: Contractor (other than the co-ordinator)• This is applied to attribute Proj_Org_Role of entity Proj_Org in
CERIF 2000 data model.
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
19
Classification schemes …… Other Lookup Tables in
CERIF
• 1.1.13 Role of a person in a publication• Use the UNIMARC Manual – Bibliographic Format 1994 (IFLA-
UBCIM), and the annex C from the updates from 1996 and 1998.
• This is applied to attribute Pers_Pub_Role of entity Pers_Pub in the CERIF 2000 data model.
• 1.1.14 Role of a person as an expert• Use the following list of values for linking a person to
expertise/skills: Consultant Evaluator Referee Reviewer
• Applied to attribute Pers_Exp_Role of entity Pers_Expert_Skill in CERIF 2000 Data Model.
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
20
Classification schemes …… Other Lookup Tables in
CERIF
• 1.1.15 Type of organization • Use the values from the harmonised information package of
the Fifth Framework Programme: BES = enterprise sector including SMEs and individual consultants HES = higher education establishments RPR = private/commercial research centres including SMEs RPN = private non-profit research centres RPU = public research centres JRC = joint research centre PUS = non-research public sector PNP = non-research private non-profit (? Sector) INO = international organizations OTH = others• This is applied to the attribute Org_Type_Full of entity
OrgUnit_Type in the CERIF 2000 Data Model.
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
21
Classification schemes …… Other Lookup Tables in
CERIF
• 1.1.16 Role of a person related to equipment• Use the following list of values: Contact person Maintenance technician Operator/technician• It is applied to attribute Pers_Equip_Role of entity Pers_Equip
in the CERIF 2000 data model.• 1.1.17 Role of an organization related to a product• The following list of values should be used: Ownership Franchise License Purchase• This is applied to attribute Org_Prod_Role of OrgUnit_Product
in the CERIF 2000 data model.
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
22
Classification schemes …… Other Lookup Tables in
CERIF
• 1.1.18 Type of equipment• Use class 6 of UDC[4]. • This is applied to entity Equipment Type of the CERIF
2000 data model.
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
23
Classification schemes …… Other Lookup Tables in
CERIF
• [1] ISCO stands for "International Standard Classification of Occupations"
• [2] UNESCO International Standard Classification of Education, level of education
• http://www.econ.ucl.ac.be/IRES/BASE_DE_DONNEES/Nomenclatures/occupations/CITE-ISCED.html
• [3] IPC stands for "International Patent Classification" - http://www.wipo.org/eng/general/ipc
• [4] UDC stands for "Universal Decimal Classification"
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
24
Classification schemes …… Dictionaries
• Example: meaning of a word (term)– Used in ensuring correct use of a value in
an attribute– For explanation of result output
• Example: multilingual– Used in multilingual query (query in
language 1 and retrieve from records stored in languages 2….n)
– Used in result output – translate (crudely) to single language as required
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
25
Classification schemes …… Dictionaries and
CERIF• Not really used• Much discussed• Overtaken by Thesauri and
Domain Ontologies
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
26
Classification schemes …… Thesauri
• Provide the structural relationships of words (terms)– Synonym (different word same meaning)– Homonym (same word different meaning)– Antonym (word with opposite meaning)– Super-term (a word whose meaning
includes the word being used e.g. person includes {male|female})
– Sub-term (a word whose meaning is included in a Super-term)
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
27
Classification schemes …… Thesauri in CERIF
• Ortelius– Multilingual thesaurus – Created for higher education– Plan to use for CERIF2000– Problems (legal) in using Ortelius
and maintaining / extending it– Out of use now in CERIF context
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
28
Classification schemes …… Thesauri in CERIF
• Revision of CERIF91 Classification by Beat Sottas and his team
• http://www.aramis-research.ch/e/cerif.htm
• Well-balanced across subject areas
• Not too deep into any one subject area
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
29
Classification schemes …… Discussion
• There is a feeling that classification schemes and codes are NOT useful
• Some claim they make values of attributes and retrieval more precise
• Some claim they assist in sorting, grouping (i.e. classifying)
• Others claim that only full-text retrieval is useful and that classification codes are unnecessary
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
30
Classification schemes …… And so….
• The idea of Ontologies is appealing
• It allows dynamic representation of relationships between terms instead of rigid (usually hierarchic) classification schemes of allowed terms
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
31
Structure
• Language handling• Classification schemes and enumerated
lists, dictionaries, thesauri • Domain ontologies• Versions for Relational DBMS• Mapping CERIF to RDBMS• OODBMS and IR systems• Keys• Surrogate keys in linking relations
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
32
Domain ontologies
Ontologies
• Ontology: philosophical study of existence and nature of reality
• In practice a resource of terms, their definitions and their logical inter-relationships
• E.g. For a publication to exist it is necessary to have a title, at least 1 author
• Publication [title AND >=1 author]
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
33
Domain ontologies
Ontologies
• Domain Ontology: Ontology covering a domain (subject area of interest)
• Example Publication• Publication [title]• Publication [author]• Publication [editor]• Collection [title + >1 author +
editor]• If Publication has Title, > 1 author and
editor it is a collection
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
34
Domain ontologies Ontologies
• Domain Ontologies in IT• A representation in first order
logic allowing– Facts to be expressed– Relationships to be expressed– Constraints to be expressed– New facts and relationships to be
deduced or induced
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
35
Domain ontologies Ontologies
• Used– Data validation on input– Clarification and improvement of a
query– Resolving heterogeneity of terms to
homogeneity– Expanding super-terms to subterms and
vice-versa conditionally– Deducing or inducing new facts and
relationships from stored facts and relationships
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
36
Domain ontologies
Data Validation on Input
Data InputScreen Domain
Ontology
CERIFDatabase
Validation
UpdateOne item
Input Dialogue
One item
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
37
Domain ontologies CERIF Ontology Work
• Work of Andrei Lopatenko http://derpi.tuwien.ac.at/~andrei/Metadata_Science.htm
• Example• <daml:Class
rdf:ID="http://derpi.tuwien.ac.at/~andrei/ontology/programmes.daml#OrgUnit">
• <rdfs:label>OrgUnit</rdfs:label> • <rdfs:comment/>
• <oiled:creationDate>16:02:07 14.11.2001</oiled:creationDate> • </daml:Class>
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
38
Domain ontologies Ontology in a Nutshell
• Much of the following comes from sources such as
• http://www.w3.org/2001/sw/WebOnt/ • http://www.daml.org/
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
39
Domain ontologies
Ontology in a Nutshell
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
40
RDFS: RDF SchemaClass
• RDF Schema allows you to define classes by direct declaration:
• <rdfs:Class rdf:ID="Product"> <rdfs:label>Product</rdfs:label> <rdfs:comment>An item sold by Super Sports Inc.</rdfs:comment> </rdfs:Class>
label of class Product is Product
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
41
RDFS: RDF SchemaProperty
You can make similar definitions of properties:
<rdfs:Property rdf:ID="productNumber"> <rdfs:label>Product Number</rdfs:label>
<rdfs:domain rdf:resource=“#Product"/> <rdfs:range rdf:resource="http://www.w3.org/2000/01/r df-schema#Literal"/>
</rdfs:Property> Product Number is a numeric literal
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
42
RDFS: RDF Schema Class Instances
You define instances of these classes by defining resources to be of the relevant RDF type, and then give them relevant properties:
<Product rdf:ID="WaterBottle"> <rdfs:label>Water Bottle</rdfs:label> <productNumber>38267</productNumber> </Product>
38276 is Product Number of Product ‘WaterBottle’
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
43
Property Values
DAML+OIL allows property values to be restricted to the data types defined in XSDL or to user-defined data types. One does this by using a specialization of RDF properties: DatatypeProperty.<daml:DatatypeProperty rdf:ID="productNumber">
<rdfs:label>Product Number</rdfs:label> <rdfs:domain rdf:resource=“#Product"/> <rdfs:range
rdf:resource="http://www.w3.org/2000/10/XMLSch ema#nonNegativeInteger"/> </daml:DatatypeProperty> constraint: Product Number is a non-negative integer
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
44
DAML + OIL: Class
The most important facilities provided by DAML+OIL are those that give designers more expressiveness in classifying resources.
The class daml:Class is defined as a subclass of rdfs:Class, and it adds many new facilities.
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
45
RDF Class Instances
<rdfs:Class rdf:ID="MaritalStatus"/> <MaritalStatus rdf:ID="Married"/> <MaritalStatus rdf:ID="Divorced"/> <MaritalStatus rdf:ID="Single"/> <MaritalStatus rdf:ID="Widowed"/>
The main problem is that the enumeration is not closed.
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
46
Using DAML beyond RDFS: Closed lists – ‘one
of’<daml:Class ID="Availability">
<daml:oneOf parseType="daml:collection"> <daml:Thing rdf:ID="InStock"> <rdfs:label>In stock</rdfs:label> </daml:Thing> <daml:Thing rdf:ID="BackOrdered"> <rdfs:label>Back ordered</rdfs:label> </daml:Thing> <daml:Thing rdf:ID="SpecialOrder"> <rdfs:label>Special order</rdfs:label> </daml:Thing>
</daml:oneOf> </daml:Class> 3 kinds of availability in the collection
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
47
Using DAML beyond RDFS: Set up Namespaces
<?xml version="1.0" encoding="UTF-8"?> <rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:daml="http://www.w3.org/2001/10/daml+oil#" xmlns:dt="http://rdfinference.org/eg/supersports/dt" xmlns:ss="http://rdfinference.org/eg/supersports/metadata" xmlns:xsd="http://www.w3.org/2000/10/XMLSchema#" xml:base="http://rdfinference.org/eg/supersports/metadata" >
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
48
Using DAML beyond RDFS: Primary
Entities Classes<daml:Ontology rdf:about="">
<daml:versionInfo>1.0</daml:versionInfo> <rdfs:comment>An ontology of Super Sports Inc. store products </rdfs:comment> <daml:imports rdf:resource="http://www.w3.org/2001/10/daml+oil"/>
</daml:Ontology> <daml:Class rdf:ID="Product">
<rdfs:label>Product</rdfs:label> <rdfs:comment>An item sold by Super Sports
Inc.</rdfs:comment> </daml:Class> <daml:Class rdf:ID="Department">
<rdfs:label>Department</rdfs:label> <rdfs:comment>A Super Sports Inc.
department</rdfs:comment> </daml:Class> set up classes Product and Department
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
49
Using DAML beyond RDFS: Secondary
Entities: Subclasses of Product<rdfs:subClassOf rdf:resource="#Product"/>
<daml:Class rdf:ID="Tool"><rdfs:label>Tool</rdfs:label> <rdfs:comment>Tools used in sports, ice axe for instance.</rdfs:comment> <rdfs:subClassOf rdf:resource="#Product"/>
</daml:Class> <daml:Class rdf:ID="Shoe">
<rdfs:label>Shoe</rdfs:label> <rdfs:subClassOf rdf:resource="#Product"/>
</daml:Class>set up subclasses of Product: Tool, Shoe
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
50
Using DAML beyond RDFS: Enumerated
List<daml:Class rdf:ID="Activity">
<rdfs:label>Activity</rdfs:label> <rdfs:comment>A sport activity</rdfs:comment> <daml:oneOf rdf:parseType="daml:collection">
<daml:Thing rdf:ID="Hiking"> <rdfs:label>Hiking</rdfs:label> </daml:Thing> <daml:Thing rdf:ID="Travel"> <rdfs:label>Travel</rdfs:label> </daml:Thing> <daml:Thing rdf:ID="Camping">
<rdfs:label>Camping</rdfs:label> </daml:Thing>
</daml:Class> Activity can take values Hiking, Travel, Camping
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
51
Using DAML beyond RDFS: Use of
classification (list)<daml:DatatypeProperty rdf:ID="productNumber"> <rdfs:label>Product Number</rdfs:label> <daml:samePropertyAs rdf:resource=“<a
href="http://rosettanet.org/FundamentalBusiness">http://rosettanet.org/FundamentalBusiness</a> DataEntities#ProprietaryProductIdentifier"/> <rdfs:domain rdf:resource="#Product"/> <rdfs:range rdf:resource="http://www.w3.org/2000/10/XMLSchema#nonNegativeInteger"/> <rdf:type rdf:resource="http://www.w3.org/2001/10/daml+oil#UniqueProperty"/>
</daml:DatatypeProperty>
Product Number within list identified of unique Product Ids
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
52
Using DAML beyond RDFS: Entity-Entity
Link<daml:ObjectProperty rdf:ID="usedFor">
<rdfs:label>usedFor</rdfs:label> <rdfs:comment>The activity for which a product is used</rdfs:comment> <daml:domain rdf:resource=“#Product"/> <daml:range rdf:resource=“#Activity"/>
</daml:ObjectProperty>
Product x used for Activity y
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
53
Using DAML beyond RDFS: An instance of
Link<ss:BackPack rdf:ID="ReadyRuck">
<rdfs:label>Ready Ruck back pack</rdfs:label> <rdfs:comment>The ideal pack for your most rugged adventures</rdfs:comment> <ss:productNumber>23456</ss:productNumber> <ss:packCapacity>45</ss:packCapacity> <ss:usedFor rdf:resource=“#Hiking"/>
</ss:BackPack>
Instance: ReadyRuck backpack used for Hiking
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
54
Summary
• DAML (extending RDF / RDFS / XML / XMLS) can represent
• at schema level (and at instance level)– Classes (primary entities)– Sub-classes (secondary entities)– Relationships (binary
relations)– Enumerated lists (lookup tables)– Value from classification scheme
• So congruent with CERIF
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
55
Summary
• Ontologies provide a means– To extend the relational domain– Further in first order logic– So allowing induction and deduction
• This is especially useful – Data validation– Query expansion / refinement– Homogeneous access over
heterogeneous data sources (schema reconciliation)
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
56
Domain ontologies
End of Ontology in a Nutshell
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
57
Ontologies and CERIF
• Work by Andrei Lopatenko and discussed within euroCRIS CERIF Task Group
• Based on use of – DAML + OIL– OilEd (editor for the languge)– With associated Description Logic
Classifier FaCT – which uses SHIQ Description Logics
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
58
Example Partial Ontology for
CERIF: ClassesClasses DefinitionActivity Programme EUProgramme
Programme which is financed-by EU or by any EU-country
ISTProgramme one of EU ProgrammesLocation Any location can be a part-of another locationCity Continent Europe Country EuropeanCountry A country which is a part-of continent EuropeEUCountry A country which is a part-of EUUnion Geopolitical or economical union of countriesEU One of unions, but which is a part of EuropeOrganization FundingOrganization
This organization which finances some Activities or Projects
EUFundingOrganization Any funding organization which is situated-in EUCountryEuropeanFundingOrganization
Funding organization which is situated-in Europe
Proiect FinancedProject Project which is-financed by some of FundingOganizationEUFinancedProject Project which is financed-by EUFundingOrganizationEuropeanFinancedProject
Project which is financed-by EuropeanFundingOrganization
ISTProject Project which is a part-of IST Programme
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
59
Example Partial Ontology for
CERIF: Relations and Axioms
Slots or Relations Properties
financed-byA financed-by Y reverse relation finances When A is financed-by Y then Y finances A
finances
part-oftransitive relation. when X is a part of Y, and Y is a part of Z, then X is a part of Z
situated-in Geographical inclusion. transitive relation
Axioms Descriptions
Project which is a part-of any of EUProgrammes is financed-by EU
When it is known that a project is a part of EUProgramme then we are sure that this project is financed by one of EUFundingOrganizations
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
60
Example Partial Ontology for CERIF:
Assertion of new Statements
Statement Proof
ISTProject is an EUProject
1. fact: ISTProject is a part of ISTProgramme, 2. fact: ISTProgramme is a EUProgramme 3. statement: ISTProject is a part of EUProgramme 4. 3 + Axiom -> ISTProject is financed by EU 5. 4 + definition of EUFinancedProject -> IST is an EUFinancedProject
EUFinancedProject is an EuropeanFinancedPtroject
1. EUFinancedProject is financed by FundingOrganization which situated in EU 2. EU is situated in Europe 3. 1 + 2 + transitiveness of situated-in EUFinancedProject is financed by organization which is situated in Europe 4. 3 + definition of EuropeanFinancedProject -> EUFinancedProject is an EuropeanFinancedProject
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
61
Structure
• Language handling• Classification schemes and enumerated
lists, dictionaries, thesauri • Domain ontologies• Versions for Relational DBMS• Mapping CERIF to RDBMS• OODBMS and IR systems• Keys• Surrogate keys in linking relations
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
62
Versions for Relational DBMS
• It is probably easiest to implement CERIF in (extended) relational technology
• Versions exist in– Oracle– SQLServer– Postgres– MySQL
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
63
Versions for Relational DBMS
• Between the different RDBMSs there are some differences:– Representation of certain data types
• Accuracy, precision
– Max length of character and varchar fields• Especially if coded in unicode
– Restrictions on primary key constraints and use of indexes
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
64
Structure
• Language handling• Classification schemes and enumerated
lists, dictionaries, thesauri • Domain ontologies• Versions for Relational DBMS• Mapping CERIF to RDBMS• OODBMS and IR systems• Keys• Surrogate keys in linking relations
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
65
Mapping CERIF to RDBMS
• The easiest way to map from CERIF Data Model to a RDBMS implementation is to use the supplied scripts on the euroCRIS CERIF Task Group Website
• Provides a full CERIF implementation• Even if do not use all of the structure
this ensures nothing is missed out
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
66
Mapping CERIF to RDBMS
• And then can add extensions as necessary / required locally– But check first the currently
proposed changes / extensions on www.eurocris.org CERIF Task Group website
• However, if wish to build up a RDBMS implementation of CERIF proceed as follows:
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
67
Mapping CERIF to RDBMS: Primary
Entities• Create table for each primary entity• Create link tables for the primary
entities• Create Language base tables for
the primary entities– And the required linking tables
• Create lookup tables for the primary entities– And the required linking tables
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
68
Mapping CERIF to RDBMS: Primary
Entities• This provides a CRIS implemented
as a RDBMS that handles very simply Project, Person, OrgUnit
• Their relationships• Their language variants for certain
text attributes• Their controlled (enumerated) lists
for attribute values
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
69
Mapping CERIF to RDBMS: Primary
Entities• Almost certainly the secondary
base table CONTACT will be required– And the required linking tables
• And possibly FUNDING PROGRAMME– And the required linking tables
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
70
Mapping CERIF to RDBMS: Secondary
Entities• Repeat as for primary entities
– Entities themselves– Linking relations– Language base tables
• And their link tables
– Lookup tables• And their link tables
• But beware – very easy to miss something and then find problems in using the system
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
71
Structure
• Language handling• Classification schemes and enumerated
lists, dictionaries, thesauri • Domain ontologies• Versions for Relational DBMS• Mapping CERIF to RDBMS• OODBMS and IR systems• Keys• Surrogate keys in linking relations
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
72
OODBMS and IR Systems
Introduction
• It is possible to implement CERIF in OODBMS, IR Systems
• And indeed in Hypermedia Systems
• Or even in a document system based on SGML or XML (i.e. a semistructured database system)
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
73
OODBMS and IR Systems OODBMS
• OODBMS possess the concept of objects with– Data (structure view)– Methods (process view)– Messages (event view)– (Note an object is a type; instances reside in a class)
• Which means that any process has to be coded specifically for any object– inheritance can help reduce the coding– but note problem of multiple conditional inheritance
• Generally OODBMS – have worse performance than RDBMS and have
poorer data representational capability
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
74
OODBMS and IR Systems IR Systems
• Information Retrieval Systems have advantages for databases with many textual attributes– Full inverted index
• Very fast retrieval• Very slow update• Little or no structural capability
(relating entities)• Little or no reporting capability (group,
sum, average…)
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
75
OODBMS and IR Systems Hypermedia Systems
• Provide good structure (relationship) handling through anchors and pointers
• Usually have good attribute type handling
• Require traversal / browsing rather than directed query – query relatively slow
• Update slow (pointer handling)• Usually poor reporting – group, sum,
average
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
76
OODBMS and IR Systems Document Systems
• Recent rise in popularity with XML as ‘semistructured databases’
• In fact since 80s SGML document systems• Query usually poor – query languages not
declarative predicates but procedural and navigational– May be fast if use IR technique of full
inverted indexes• Update slow if change to several entity
instances; fast if one document• Report capability variable – group, sum,
average
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
77
Structure
• Language handling• Classification schemes and enumerated
lists, dictionaries, thesauri • Domain ontologies• Versions for Relational DBMS• Mapping CERIF to RDBMS• OODBMS and IR systems• Keys• Surrogate keys in linking relations
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
78
Keys
• ‘The key’ to successful implementation
• Key attribute value identifies uniquely a tuple– Within base relation : primary
key– Within another relation : foreign key
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
79
Keys
Project
Person
Person
PROJECT PERSON
PK FK PK
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
80
Structure
• Language handling• Classification schemes and enumerated
lists, dictionaries, thesauri • Domain ontologies• Versions for Relational DBMS• Mapping CERIF to RDBMS• OODBMS and IR systems• Keys• Surrogate keys in linking relations
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
81
Surrogate keys in linking
relations The Problem
• Wish to link flexibly– An instance in an entity to a related
instance in another entity (relationship)– An instance in an entity to another
instance in the same entity (recursion)
• Examples– Person <-> Project e.g. x is leader of y– Person <-> Person e.g. x is boss of y
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
82
Surrogate keys in linking
relations RelationshipUsual Relation
Project
Person
Person
PROJECT PERSON
PK PKFK
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
83
Surrogate keys in linking
relations RelationshipProblem
• Supports only 1 (Project) to n (persons)• i.e. the persons on any 1 project, with
all their attributes (dependencies)
• In many cases need to indicate that– The same person works on several projects– In different roles (e.g. leader, programmer)– At different (or the same) time periods
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
84
Surrogate keys in linking
relations RelationshipBinary Relation
Project
Person
PROJECT PERSON
Project
Person
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
85
Surrogate keys in linking
relations The Problem with a Binary Relation
• How to identify uniquely each tuple• Have effectively 2 primary keys• And even in combination their
concatenated value may not be unique– Same relationship with different roles– Same relationship at different
date/time intervals Surrogate key
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
86
Surrogate keys in linking
relations RelationshipBinary Relation
Project
Person
PROJECT PERSON
Project
Person
Surrogate
Role
StartDate
EndDate
Surrogate key (primary, unique)
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
87
Surrogate keys in linking
relations Recursion Usual Relation
PK &FK
Person
PERSON
Person
Actually works like this
PERSON
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
88
Surrogate keys in linking
relations RecursionBinary Relation
Person
PERSON
Person
Person
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
89
Surrogate keys in linking
relations RecursionBinary Relation
Person
PERSON
Person
Person
How the tuples from Person are represented in the binary relation
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
90
Surrogate keys in linking
relations The Problem with a Binary Relation
• How to identify uniquely each tuple• Have effectively 2 primary keys• And even in combination their
concatenated value may not be unique– Same relationship with different roles– Same relationship at different
date/time intervals Surrogate key
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
91
Surrogate keys in linking
relations RecursionBinary Relation
Person
PERSON
Person
Surrogate
Role
StartDate
EndDate
Person
In practice usually have more attributes than Person / Person
Surrogate key (primary, unique)
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
92
Surrogate keys in linking
relations Binary Relation
• Flexible• Allows n : m• With added attributes e.g. role,
date/time• Thus permitting
– Conditional relationships– Temporal relationships– i.e. rich semantics
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
93
Conclusion
• Implementation of the CERIF datamodel as a RDBMS is fairly straightforward– Entity = table– Attribute = attribute– Type = type (may need some coded
extension)– Constraint = constraint (coded, mainly not
provided in RDBMS)– Relationship = binary relation table– Language and character set variants =
tables linked to associated entity
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
94
Conclusion
• CERIF can also be implemented in other systems / architectures
• CERIF provides assistance in data integrity– Enumerated lists (dictionaries)– Classification terms (thesauri)– Semantics (domain ontologies)
© Keith G Jeffery & Anne Asserson
CERIF Course: Implementation 20021024
95
Conclusion
• CERIF provides consistently– Full CRIS data model: implement a
CRIS– Data exchange models: link CRISs– Metadata model: access to CRISs
• CERIF works