Introduction to Semantic webThe Semantic Web provides a common
framework that allows data to be shared and reused across
application, enterprise, and community boundaries. Semantic Web
architecture and applications are the next generation in
information architecture. The word semantic stands for the meaning
of. The Semantic Web = a Web with a meaning.The idea of having data
on the web defined and linked in a way that it can be used by
machines for various applications. It Enables Computers to find out
knowledge which is distributed throughout the web. The Features of
Semantic Web are:
Information Flexibility
Able to relate each other
First Generation - Keywords
Keyword technologies were originally used in IBMs free text
retrieval system in the late 1960s. These tools are based on a
simple scan of a text document to find a key word or root stem of a
key word. This approach can find key words in a document, and can
list and rank documents containing key words. But, these tools have
no ability to extract the meaning from the word or root stem, and
no ability to understand the meaning of the sentence.
Advanced Search
Most keyword systems now include some form of Boolean logic AND
, OR functions to narrow searches. This is often called advanced
search. But, using Boolean logic to exclude documents from a search
is not advanced . It is an arbitrary and random means to reduce the
size of the source database to reduce the number of documents
retrieved. This advanced search significantly increases false
negatives by missing many relevant source documents.
Examples:
The most common examples of key word tools are web site Search
tools and the Microsoft Find function (control f key) in Microsoft
Office applications.
Second Generation - Statistical Forecasting
Statistical forecasting first finds keywords; and then
calculates the frequency and distance of these keywords.
Statistical forecasting tools now include many techniques for
predictive forecasting, most often using inference theory. The
frequency and distribution of words has some general value in
understanding content. But, cannot understand the meaning of words
or sentences; or provide context. These tools are still limited by
keyword constraints; and can only infer simplistic meaning from the
frequency and distribution of words.
Applications:
Statistical forecasting tools are appropriate for performing
simple document searches where the desired output is a list of
documents which contain specific words which must then be read and
classified and summarized manually by end users. These are not
capable of understanding the meaning or context or relationships of
documents.
Problems:
The most common problems with statistical forecasting tools are:
a) Keyword limitations of false positives and false negatives;b)
Misunderstanding the meaning of words and sentences (man bites dog
is the same as dog bites man); c) Lack of contextExamples:
The most common statistical forecasting tool is Google and many
other tools using inference theory and similar analysis and
predictive algorithms.
Third Generation - Natural Language Processing
Natural language processors focus on the structure of language.
These recognize that certain words in each sentence (nouns and
verbs) play a different role (subject-verb-object) than others
(adjectives, adverbs, articles). This understanding of grammar
increases the understanding of key words and their relationships.
(man bites dog is different from dog bites man). But, these tools
cannot extract the understanding of the words or their logical
relationship beyond their basic grammar. And, these cannot perform
any information summary, analysis or integration functions.
Applications:
Natural language tools are appropriate for linguistic research
and word-for-word translation applications where the desired output
is a linguistic definition or a translation. These are not capable
of understanding the meaning or context of sentences in documents,
or integrating information within a database.
Problems:
The most common problems with linguistic tools are: a) keyword
limitations of false positives and false negatives; b)
misunderstanding the context (does I like java mean an island in
Indonesia, a computer programming language or coffee?) Without
understanding the broader context, a linguistic tool only has a
dictionary definition of Java and does not know which Java is
relevant or what other data related to a specific Java concept.
Examples:
The most common natural language tools are translator programs
which use dictionary look up tables to convert words and
language-specific grammar to convert source to target
languages.
Fourth Generation Semantic Web Architecture and Applications
Semantic web architecture and applications are a dramatic
departure from earlier database and applications generations.
Semantic processing includes these earlier statistical and natural
langue techniques, and enhances these with semantic processing
tools. First, Semantic Web architecture is the automated conversion
and storage of unstructured text sources in a semantic web
database. Second, Semantic Web applications automatically extract
and process the concepts and context in the database in a range of
highly flexible tools.
a. Architecture; not only Application
The Semantic web is complete database architecture, not only an
application program. Semantic web architecture combines a two-step
process. First, a Semantic Web database is created from
unstructured text documents. And, then Semantic Web applications
run on the Semantic Web database; not the original source
documents.
The Semantic Web architecture is created by first converting
text files to XML and then analyzing these with a semantic
processor. This process understands the meaning of the words and
grammar of the sentence, and also the semantic relationships of the
context. These meanings and relationships are then stored in a
Semantic web database. The Semantic Web is similar to the schematic
logic of an electronic device or the DNA of a living organism. It
contains all of the logical content AND context of the original
source. And, it links each word and concept back to the original
document.
Semantic Web applications directly access the logical
relationships in the Semantic Web database. Semantic web
applications can efficiently and accurately search, retrieve,
summarize, analyze and report discrete concepts or entire documents
from huge databases.
A search for Java links directly to the three Semantic Web
logical clusters for Java: (island in Indonesia, a computer
programming language, and coffee). The processor can now query the
user for which Java, and then expand the search to all other
concepts and documents related to the specific Java.
b. Structured and Unstructured Data
Semantic Web architecture and applications handle both
structured and unstructured data. Structured data is stored in
relational databases with static classification systems, and also
in discrete documents. These databases and documents can be
processed and converted to Semantic Web databases, and then
processed with unstructured data.
Much of the data we read, produce and share is now unstructured;
emails, reports, presentations, media content, web pages. And,
these documents are stored in many different formats; text, email
files, Microsoft word processor, spreadsheet, presentation files,
Lotus Notes, Adobe.pdf, and HTML. It is difficult, expensive, slow
and inaccurate to attempt to classify and store these in a
structured database. All of these sources can be automatically
converted to a common Semantic Web database, and integrated into
one common information source.
c. Dynamic and Automatic; not Static and Manual
Semantic Web database architecture is dynamic and automated.
Each new document which is analyzed, extracted and stored in the
Semantic Web expands the logical relationships in all earlier
documents. These expanding logical relationships increase the
understanding of content and context in each document, and the
entire database. The Semantic Web conversion process is automated.
No human action is required for maintaining a taxonomy, meta data
tagging or classification. The semantic database is constantly
updated and more accurate.
Semantic Web architecture is different from relational database
systems. Relational databases are manual and static because these
are based on a manual process for maintaining a taxonomy, meta data
tagging and document classification in static file structures.
Documents are manually captured, read, tagged, classified and
stored in a relational database only once, and not updated. More
important, the increase in new documents and information in a
relational database does not make the database more intelligent
about the concepts, relationships or documents.
d. From Machine Readable to Machine Understandable
Semantic Web architecture and applications support both human
and machine intelligence systems. Humans can use Semantic Web
applications on a manual basis, and improve the efficiency of
search, summary, analysis and reporting tasks. Machines can also
use Semantic Web applications to perform tasks that humans cannot
do; because of the cost, speed, accuracy, complexity and scale of
the tasks.
e. Synthetic vs Artificial Intelligence:
Semantic Web technology is NOT Artificial Intelligence. AI was a
mythical marketing goal to create thinking machines. The Semantic
Web supports a much more limited and realistic goal. This is
Synthetic Intelligence. The concepts and relationships stored in
the Semantic Web database are synthesized, or brought together and
integrated, to automatically create a new summary, analysis,
report, email, alert; or launch another machine application. The
goal of Synthetic Intelligence information systems is bringing
together all information sources and user knowledge, and
synthesizing these in global networks. Semantic Web Building
Blocks
(1) URI A URI is simply a Web identifier, like the strings
starting with Http or Ftp that is often seen on the World Wide Web.
Anyone can create a URI. Every data object and every data
schema/model in the Semantic Web must have a unique URI. A Uniform
Resource Locator (URL) is a URI that, in addition to identifying a
resource, provides a means of acting upon or obtaining a
representation of that resource by describing its primary access
mechanism or network location.(2) RDF stands for Resource
Description Framework
RDF is a framework for describing resources on the web
RDF is designed to be read and understood by computers
RDF is not designed for being displayed to people
RDF is written in XML
RDF is a part of the W3C's Semantic Web Activity
RDF is a W3C Recommendation
RDF extends the linking structure of the Web to use URIs to name
the relationship between things as well as the two ends of the
link. The underlying structure of any RDF document is a collection
of triples. This collection of triples is usually called the RDF
graph. Each triple states a relationship (aka. edge, property)
between two nodes (aka.resource) in the graph. This abstract data
model is independent of concrete serialization syntax. Therefore
query languages usually do not provide features to query
serialization-specific features, e.g. order of serialization.
Using this simple model, it allows structured and
semi-structured data to be mixed, exposed, and shared across
different applications. RDF provides a general, flexible method to
decompose any knowledge into small pieces, called triples, with
some rules about the semantics (meaning) of those pieces. By using
XML, RDF information can easily be exchanged between different
types of computers using different types of operating systems and
application languages.
Support for XML schema data types.
XML data types can be used to represent data values in RDF. XML
Schema also provides an extensibility framework suitable for
defining new data types for use in RDF. Data types should therefore
be supported in an RDF query language.
Free support for making statements about resources. In general,
it is not assumed that complete information about any resource is
available in the RDF query. A query language should be aware of
this and should tolerate incomplete or contradicting
information.
(3) RDFS
RDF Schema extends RDF and is a vocabulary for describing
properties and classes of RDF-based resources, with semantics for
generalized-hierarchies of such properties and classes. Classes in
RDF Schema are much like classes in object oriented programming
languages. This allows resources to be defined as instances of
classes, and subclasses of classes. RDFS is used to define
relations between resources and organize the hierarchy.(4) OWL
OWL is a language for processing web information. OWL stands for
Web Ontology Language. OWL is built on top of RDF. OWL is for
processing information on the web. OWL was designed to be
interpreted by computers. OWL was not designed for being read by
people. OWL is written in XML. OWL has three sublanguages.
RDF Query languageQuery Language Properties
Expressiveness
Expressiveness indicates how powerful queries can be formulated
in a given language. Typically, a language should at least provide
the means offered by relational algebra, i.e. be relationally
complete. Usually, expressiveness is restricted to maintain other
properties such as safety and to allow an efficient (and
optimizable) execution of queries.
ClosureThe closure property requires that the results of an
operation are again elements of the data model. This means that if
a query language operates on the graph data model, the query
results would again have to be graphs.
AdequacyA query language is called adequate if it uses all
concepts of the underlying data model. This property therefore
complements the closure property: For the closure, a query result
must not be outside the data model, for adequacy the entire data
model needs to be exploited.
OrthogonalityThe orthogonality of a query language requires that
all operations may be used independently of the usage context.
SafetyA query language is considered safe, if every query that
is syntactically correct returns a finite set of results (on a
finite data set). Typical concepts that cause query languages to be
unsafe are recursion, negation and built-in functions.
Query Languages are
RQL
RQL is a typed language following a functional approach, which
supports generalized path expressions featuring variables on both
nodes and edges of the RDF graph. RQL relies on a formal graph
model that captures the RDF modeling primitives and permits the
interpretation of superimposed resource descriptions by means of
one or more schemas.
SeRQL
Sesame RDF Query Language and is a querying and transformation
language loosely based on several existing languages, most notably
RQL, RDQL. Its primary design goals are unification of best
practices from query language and delivering a light-weight yet
expressive query language for RDF that addresses practical
concerns.
RDQL
The syntax of RDQL follows a SQL-like select pattern, where a
from clause is omitted. For example, select ?p where (?p, , "foo" )
collects all resources with label foo in the free variable p. The
select clause at the beginning of the query allows projecting the
variables. Namespace abbreviations can be defined in a query via a
separate using clause. RDF Schema information is not interpreted.
Since the output is a table of variables and possible bindings,
RDQL does not fulfill the closure and orthogonality property. RDQL
is safe and offers preliminary support for datatypes.RDF model is
it different from the XML model?RDF defines a data model based on
triples: Object, Property, and Value. triple(author, page, Ora)
Representation is as follows
XML representation is
page
Ora
or maybe
Ora
or maybe
href="page"
Ora
or maybe
href="page"
Ora
The XML Graph
These are all perfectly good XML documents - and to a person
reading then they mean the same thing. To a machine parsing them,
they produce different XML trees.Suppose you look at the XML
tree
a="ppppp"
qqqqq
Looking at the simple XML encoding above,
page
Ora
it could be represented as a graph
We can represent the tree more concisely if we make shorthand by
writing the name of each element inside its circle:
Ontologies
Ontologies will play a major role in supporting information
exchange processes in various areas. Ontologies were developed in
Artificial Intelligence to facilitate knowledge sharing and reuse.
Ontology is also becoming widespread in fields such as intelligent
information integration, cooperative information systems,
information retrieval, electronic commerce, and knowledge
management. The reason ontologies are becoming so popular is in
large part due to what they promise: a shared and common
understanding of some domain that can be communicated between
people and application systems. Because ontologies aim at
consensual domain knowledge their development is often a
cooperative process involving different people, possibly at
different locations. Components of an Ontology
A computational ontology consists of a number of different
components, such as Classes, Individuals and Relations.
Concept
Concepts, also called Classes, Types or Universals are a core
component of most ontology. A Concept represents a group of
different individuals that share common characteristics, which may
be more or less specific. Concepts also share relationships with
each other; these describe the way individuals of one Concept
relate to the individuals of another.
Individual
Individuals also known as instances or particulars are the base
unit of an ontology; they are the things that the ontology
describes or potentially could describe. Individuals may model
concrete objects such people, machines or proteins; they may also
model more abstract objects such as this article, a persons job or
a function.
Relation
Relations in ontology describe the way in which individuals
relate to each other. Relations can normally be expressed directly
between or between Concepts.Ontology Applications: Natural Language
Applications
Knowledge Management
Enterprise Application Integration
E-Commerce
In database and information Retrieval
Knowledge Management
Knowledge Management is concerned with acquiring, maintaining,
and accessing knowledge of an organization. It aims to exploit an
organisation's intellectual assets for greater productivity, new
value, and increased competitiveness. Knowledge management systems
have severe weaknesses:
Searching information: Existing keyword-based search retrieves
irrelevant information which uses a certain word in a different
context, or it may miss information where different words about the
desired content are used.
Extracting information: Human browsing and reading is currently
required to extract relevant information from information sources,
as automatic agents lack all common sense knowledge required to
extract such information from textual representations, and they
fail to integrate information spread over different sources.
Maintaining weakly structured text sources is a difficult and
time-consuming activity when such sources become large. Keeping
such collections consistent, correct, and up-to-date requires a
mechanized representation of semantics and constraints that help to
detect anomalies.
Automatic document generation: Adaptive web sites which enable a
dynamic reconfiguration according to user profiles or other
relevant aspects would be very useful. The generation of
semistructured information presentations from semi-structured data
requires a machine-accessible representation of the semantics of
these information sources.
EMBED PowerPoint.Slide.12
13
URI, HTML, HTTP
Static
WWW
Serious Problems in
informationfindingextractingrepresentinginterpretingand
maintaining
RDF, RDFS, OWL
Semantic Web
Semantic Web
URI, HTML, HTTP
Static
WWW
500 million usermore than 3 billion pages
Current Web