Web Spa

WebSpa

Monica Macoveiciuc and Constantin Stan

Faculty of Computer Science, Alexandru Ioan Cuza University, Iasi

Abstract. WebSpa is a tool that allows the quick, intuitive (and evenfun) interrogation of arbitrary SPARQL endpoints. WebSpa runs in theweb browser and does not require the installation of any additional soft-ware. The tool manages a large variety of pre-defined SPARQL endpointsand allows the addition of new ones. An user account gives the possibilityof saving both the interrogation and its results on the local computer,as well as further editing of the queries. The application is written inboth Java and Flex. It uses Jena and ARQ application programminginterface in order to perform the queries, and the results are processedand displayed using Flex.

Introduction

The Web is a universal medium for information and data exchange. Exploitingthe huge amount of knowledge distributed on the Web is a significant challenge.Humans can understand the information, but it takes great effort to find andcombine data from such a large number of sources; on the other hand, computerscan easily browse through millions of pages in no time, but they are not capableof understanding the content. The Semantic Web is an extension of the WorldWide Web, “in which information is given well-defined meaning, better enablingcomputers and people to work in cooperation”[1].

RDF (Resource Description Format), together with SPARQL, provide a power-ful mechanism for describing and interchanging metadata on the web. RDF isthe W3C standard for encoding knowledge. It is a structure for describing andinterchanging metadata on the Web in numerous forms and purposes. SPARQLis both a query language and a remote access protocol. A SPARQL endpointenables users (human or machine) to query a knowledge base via the SPARQLlanguage. Results are typically returned in one or more machine-processable for-mats. Therefore, a SPARQL endpoint is mostly conceived as a machine-friendlyinterface towards a knowledge base.

WebSpa is a web based application that manages SPARQL endpoints and allowsusers to perform interrogations over them. It provides pre-defined SPARQL end-points and supports the addition of new ones. Anyone can use the application,but an user account is required in order to performs certain operations, such asadding endpoints, saving and later editing a query. The results are displayed inthe browser and they can also be stored locally, in XML format.

The application is written in both Java and Flex. It uses Jena and ARQ ap-plication programming interface in oder to perform the queries, and the resultsare processed and displayed using Flex.

This paper describes the WebSpa application, as well as the way it is built.The first chapter contains a brief explanation of the concepts of SPARQL andSPARQL endpoint, and the technologies used for building the application - Java(Jena, ARQ) and Flex. The next chapter presents the interface, functionalityand storage capabilities of the application. The front and back-end are then de-scribed and. Conclusions are drawn and perspectives are suggested in the finalchapter.

Technologies

1 SPARQL

SPARQL[2] (which is pronounced “sparkle” and has as recursive acronym -SPARQL Protocol and RDF Query Language) is an RDF query language. It’sa fresh W3C Recommendation about which Sir Tim Berners-Lee said that “willmake a huge difference”. RDF is pretty foundational to the Semantic Web. UntilSPARQL’s launch, RDF had a data model, a formal semantics, and a concreteserialization (in XML), but what it didnt have was a standard query language.

SPARQL came in place and now offers to the Semantic Web and to Web 2.0a common data manipulation language in the form of expressive query againstthe RDF data model. Using WSDL 2.0, SPARQL Protocol for RDF describesa very simple web service with one operation, query which is available withboth HTTP and SOAP bindings. This operation is the way you send SPARQLqueries to other sites and the way you get back the results. The HTTP bindingsare REST-friendly and a simple SPARQL protocol client takes little amount ofcode in order to implement.

SPARQL consists of 3 separate specifications. The first one is the query lan-guage specification (which makes up the core). The second is the query resultsXML format (which describes an XML format or serializing the results of anSPARQL queries - SELECT, ASK). The third specification is the data accessprotocol (which uses WSDL 2.0 to define simple SOAP and HTTP protocols forremotely querying RDF databases - or any data repository that can be mappedto the RDF model). Altogether it consists of a query language, a mean of con-veying a query to a query processor service and defining the XML format inwhich the results will arrive.

Some issues are not addressed yet by SPARQL. The most notable is that it can’tmodify an RDF dataset (it’s read-only). As we mentioned previously, RDF isbuild on the triple pattern (a 3-tuple consisting of subject, predicate, and ob-ject). Similar to RDF, SPARQL is built on the triple pattern, which also consistsof a subject, predicate and object. SPARQL allows to match patterns in an RDFgraph using triple patterns, which are like triples except they may contain vari-ables in place of concrete values (the variables are used as “wildcards” to matchRDF terms in the dataset).

The SELECT query can be used to extract data from an RDF graph, returningit as an array result set. For more complex graph patterns one should use re-quired and/or OPTIONAL data. UNION queries are also a way of dealing withselecting alternatives from the dataset. It is possible to apply ordering to the

results, jump forward through results using OFFSET, and LIMIT the amount ofdata returned. The SPARQL Query Results XML Format specification includesseveral relevant examples. Given its obvious simplicity and regular structure,manipulating this format with XSLT or XQuery is fairly trivial.

The syntax shortcuts make writing queries much simpler. These are especiallyuseful with repetitive graph patterns and long URIs. SPARQL presents itself asbeing the missing and long waited part from the Semantic Web and Web 2.0.

A SPARQL endpoint is a conformant SPARQL protocol service as defined inthe SPROT (SPARQL Protocol for RDF) specification. A SPARQL endpointenables users (human or other) to query a knowledge base via the SPARQLlanguage. Results are typically returned in one or more machine-processable for-mats. Therefore, a SPARQL endpoint is mostly conceived as a machine-friendlyinterface towards a knowledge base. Both the formulation of the queries and thehuman-readable presentation of the results should typically be implemented bythe calling software, and not be done manually by human users.

At the time, there is no agreed description for a SPARQL endpoint. Endpointdescriptions can be used to announce endpoint capabilities and contents, supportdiscovery through service directories, supply browsing and federation hints.

2 Jena and ARQ

Jena[3] is an open source Semantic Web framework for Java. It provides anAPI to extract data from and write to RDF graphs, which are presented as anabstract “model”. This model can be queried through SPARQL and updatedthrough SPARUL[4].

Jena uses the concept of graph for dealing with the data: the nodes correspondto URIs, while the edges are the triples.

The graphs are represented through the Model interface, which has differentimplementations: a memory-based one, one which uses a relational database etc.The memory-based model is the simplest and easier to use one.

A triple is represented through an interface called Statement. A statement cor-responds to an edge in the graph and consists of three parts:

– the subject - the resource from which the arch leaves - implements the Re-source interface;

– the predicate - the property (the label of the arch) - implements the Propertyinterface;

– the object - the resource that is pointed by the arch - implements the Re-source or the Literal interface.

The components of the statement have a common base - the RDFNode interface.

The object component is more complex. A statement can be used as the objectcomponent of the triple, since RDF allows nested statements. Objects imple-menting the Container, Alt, Bag, or Seq interface can also be used as objects.

An RDF Model is represented as a set of statements. Accessing the compo-nents of the statement can be achieved through the getSubject, getPredicateand getObject methods of the Statement class. The API provides methods forthe most common operations:

– addProperty - adds a new statement (triple) to the model;– listSubjects - lists the subject component of each triple from the model;– listObjects - lists the object component of each triple from the model;– write - writes the model in RDF XML format to the output stream given as

parameter;– read - reads the statements in RDF XML format into a model.

ARQ is a query engine for Jena that supports the SPARQL language. In additionto implementing SPARQL, ARQ’s query engine can also parse queries expressedin RDQL[5] or its own internal query language.

3 Flex

The Flex framework provides the declarative language, application services, com-ponents, and data connectivity developers need to rapidly build rich Internetapplications (RIAs) for the browser or desktop. Flex 3 is a powerful frameworkthat provides enterprise-level components for the Flash Player platform in a

markup language format recognizable to anyone with HTML or XML develop-ment experience. The Flex Framework provides components for visual layout,visual effects, data grids, server communication, charts, and much more.

MXML is the language developers use to define the layout, appearance, and be-haviors of a Flex application. ActionScript 3, an object-oriented language basedon industry-standard ECMAScript, is the language that defines the client-sideapplication logic. Your MXML and ActionScript are compiled together into asingle SWF file that makes up your Flex application. Because the compiler isavailable both as a standalone utility in the Flex 3 SDK and as part of AdobeFlex Builder 3 software, developers can choose to develop in the Eclipse basedFlex Builder IDE or in an IDE of their choice.

Flex includes a pre-built class library and application services that help develop-ers assemble and build RIAs. These services include data binding, drag-and-dropmanagement, the display system that manages the interface layout, the style sys-tem that manages the look and feel of interface components, and the effects andanimation system that manages motion and transitions. The component libraryprovides all of the user interface controls that developers need, from simple but-tons, checkboxes, and radio buttons to complex data grids, combo boxes, andrich text editors. Use the provided containers to design complex, adaptive lay-outs with ease, and use (or modify) the visually stunning skins to achieve anideal look and feel.

The Adobe AIR runtime extends web applications to the desktop, creating newopportunities for more engaging, higher performing online/offline applications.The Flex framework provides native support for the new AIR APIs, and FlexBuilder 3 provides all the tools necessary to build, debug, package, and signapplications built on Adobe AIR.

Most Flex applications that are designed using just the Flex framework areas follows: On one side, you have properly decoupled and reusable view compo-nents, that know nothing about the rest of the application, dispatching eventsand using data binding from parents and to children views. On the other side, amain application container (the Application root tag) acts both as a controllerand a model, sometimes delegating tasks to some “utils” classes that handlethings such as RPC communications. In other words, a big Master object, whichcan quickly take the form of a hideous spaghetti-code monster. If the applicationis quite simple, and no long time maintenance is required, this might be not sucha bad choice.

Using Events and dataBinding, Flex can achieve a very good view componentsdecoupling. But your main view container will either have to handle all the logicby itself, or explicitly delegate it to a controller class with which it will thenbe very tightly coupled. The main problem is that user interaction events dis-

patched by the views cannot directly communicate with a separate applicationcontroller, unless it is a view. To have a separate controller handle these eventsand take actions, the events have to climb the display list up to the main ap-plication container (the root application tag) which may then lead them to thecontroller.

WebSpa

4 Functionality

WebSpa is a tool with which one can interrogate arbitrary SPARQL endpointsin a more intuitive way. It provides pre-defined SPARQL endpoints and supportsthe addition of new ones. Anyone can use the application, but an user accountis required in order to performs certain operations, such as adding endpoints,saving and later editing a query. The results are displayed in the browser andthey can also be stored locally, in XML format.

The interface is easy to use and intuitive.

The top bar contains user-related information: the name of the currently loggeduser, a form for either registering, logging in or out. The top right menu can beused for both login and register, by ticking or un-ticking the “I’m new” checkbox.The submit button changes according to this action, into either “create”, or “login”.

The menu contains three groups of actions:

QueryResultMore

The first group contains actions related to the queries. A logged user can chooseto create, open or save a query. The same actions are present in the second menugroup, but they are related with the results of the query. The last menu itemprovides information about the application.

The screen is divided in two parts - the top window is bound to the query, whilethe bottom window is used for managing the results.

The first one contains a menu which helps a less trained user to write queries. The“select endpoint” dropdown contains predefined SPARQL endpoints. A user thathas created an account can add and remove his own endpoints. These actionscan be performed through the group of options next to the menu - “Refresh”,“Save”, “Delete”. The next dropdown contains predefined RDF Schemas thatare automatically inserted in a query when the user selects one.

The “template” menu contains four options, corresponding to the four typesof SPARQL queries supported - SELECT, ASK, DESCRIBE, CONSTRUCT.When a user selects one of these items, a pattern of the query is written in thequery window. The user can then personalize this pattern. For example, choosingthe “SELECT” option inserts the following content:

SELECT DISTINCT ?s ?p ?o

WHERE

{

?s ?p ?o .

}

The “Statement” menu contains code editing features (comment/uncommentor indent/remove indent), selection applicable stateaments and example (fill in)statements.

The last item of the bar is a group of buttons that provide quick access tothe most important actions - creating a new query, opening an existing one,saving a query.

A bar with options divides the screen in two windows. It contains items thathelp users manage the queries. The “Run Query” button runs a query, and theprovided results are displayed in the window below. This query can be given aname and saved in the database for later use. A user can choose to load his savedqueries, which can then be edited or saved as new queries.

5 Front-End

The front-end, realized in Adobe Flex Builder, is using as backbone the AdobeFlex Framework 3.2.0. Not using a stronger MVC constrain from the frameworkside has its advantages and/or disadvantages and this was discussed in previousmaterial.

As most of the projects built with Adobe Flex Builder, WebSpa is an Web appli-cation (means that it runs in Flash Player - the Adobe runtime for web browsers).The whole project targets as Player the version 10.0.0 of Flash Player. This is atechnical requirement because this version of the Player offers an unprecedentedAPI when comes to work with files: saving and opening local files on the client’smachine. Previous versions of Flash Player made this possible only via serverside scripts (server was used as a proxy). From now on we’ll refer to Adobe FlexBuilder simply with Flex.

The project has as “entry point” the WebSpa.mxml file. This is where things

start to happen. In this MXML file the layout and content of the header and ofthe footer of the application are described. Also this class manages users (creates,logs in or logs out users) and offers an application menu. Into the project wecreated a Config class which acts as configuration data holder. This class holdsdata as the server paths, project internal data and other type of data. Changingthe project’s server side application location is much easier this way. After 1 lineedit and a recompile the application is “good to go”. All properties within thisclass are static.

All styles within the application (if altered) are managed from a *.css file (Main.css).For the other pages within the project we used MXML components. This factassures us the possibility of an easy extension in the future if desired. We’vegot the About.mxml and ClassicWebSpa.mxml. The About page is quite self ex-plainable. The ClassicWebSpa is our SparQL interrogation tool which interactswith the Java server side for providing results to queries written by users.

Inside this page the user is presented the option to refresh / save / delete end-points or to save / load queries on the server side and have access to them viahis account. This option is available only if the user is logged in. Also the usehas the option to create new / open / save queries or results locally. We evenoffer the option to save under 2 different file formats (*.wsq which stands forWebSpaQuery and *.wsr which stands for WebSpaResult). This is offered to theuser in order to manage/locate files related with our application more easily.

Basically our application aids the user to execute a SparQL query against anendpoint and retrieve the result of that query for a later manipulation. The ap-plication menu offers pretty much the same functionality as the ClassicWebSpapage, plus the connection to the About page.

For the whole communication between front-end and back-end we use an wrap-per class that automates the server calls (offers the server side script, the GETparameters, the POST parameters and the callback on execution success). Thismake things much more readable and easy to manage in the rest of the classes.For local files management we use two static classes which handled pretty wellsaving / opening queries or/and results.

For the future the project may be extended easily because of the decoupledway things function within it.

6 Back-End

The back-end is written in Java, using the Jena framework and the ARQ supportfor SPARQL that it provides. The code is organized in three packages:

webspa.persistance

webspa.servletwebspa.sparql

The package webspa.servlet contains the central point of the application, theclass WebSpaServlet. It is a servlet that communicates with the Flex front-endof the application and redirects each call to the appropriate method. The servletis also responsible with starting the SQL transaction, as well as with committingthe changes. The servlet contains a HibernateBean object, whose methods arecalled for each action. The requests comming from the front-end are analyzedin order to determine which method to be accessed, based on the value of the“fn” parameter. Once the method identified, it is passed some general parame-ters - the request, response and the EntityManager object (used for managingthe communication with the database tables). Some of the methods require ad-ditional parameters, such as boolean values or certain parameters specified inthe request. Each of the HibernateBean’s methods return the response as aString variable, which is then passed to Flex for further processing, using theHttpServletResponse’s PrintWriter object:

if (method.equalsIgnoreCase("create-user")) {

response.getWriter().write(hBean.

registerUser(request, response, entityManager));

}

The method that executes the SPARQL query is accessed differently. A SparQLobject is created and its executeSpaQuery method is called, with the followingparameters: the query’s text, the endpoint’s URL and the PrintWriter object.The method itself is responsible with writing the response to this object.

Package webspa.persistance contains the classes used for the persistence layerof the application, which is implemented with Hibernate and JavaPersistenceAPI (JPA). The data is stored in a MySQL database. There are five classesmapped to the database tables:

class User - table USERS - manages the users of the application.

class EndPoint - table ENDPOINT - stores endpoint information, such as theurl, the type (default or added by user), the status etc.

class UserEndPoint - table USERENDPOINT - contains a one to many mappingbetween an user and his additional endpoints.

class Model - table MODEL - stores RDF Schemas.

class Query - table QUERY - stores save queries.

Each class has attached an .hbm.xml file, that specifies the mapping between

class variables and table columns. For example, the User class has the followingvariables:

@Id

@GeneratedValue(strategy = GenerationType.AUTO)

private Integer uid;

private String username;

private String password;

private String lastIp;

@Column(name="created")

private Timestamp creationDate;

@Column(name="lastlogin")

private Timestamp lastLoginDate;

The tokens preceded by “@” are JPA annotations. The corresponding .hbm.xmlfile contains the mapping:

<hibernate-mapping>

<class dynamic-insert="false" dynamic-update="false"

mutable="true" optimistic-lock="version"

polymorphism="implicit" select-before-update="false"

name="webspa.persistance.User"

table="users">

<id column="UID" name="uid">

<generator class="increment"/>

</id>

<property column="username" name="username"/>

<property column="password" name="password"/>

<property column="lastip" name="lastIp"/>

<property column="created" name="creationDate"/>

<property column="lastlogin" name="lastLoginDate"/>

</class>

</hibernate-mapping>

A class named HibernateBean is responsible with the communication with thedatabase. HibernateBean contains methods for retrieving, saving and updatingendpoints, queries and/or results. Each of the methods follows a similar pattern- validation of the parameters, creation and execution of the query, formattingand sending the result. The methods implemented are:

registerUser

loginUser

logoutUser

createEndPoint

hideShowEndPoint

saveQuery

updateQuery

checkUserById

checkField

checkEndPointByUserUrl

getLoggedUserId

getEndPoints

getModels

getQueries

Along with the class that communicates with the database, the package web-spa.sparql contains the class SparQL. Using Jena and ARQ APIs, the class runsqueries over SPARQL endpoints. The process of sending a query and getting aresult is simple. First, a QueryExecution object is obtained using the ARQ’s classQueryExecutionFactory and its sparqlService method. This method requires twoarguments - the query and the endpoint URL, both provided by the user throughthe interface.

QueryExecution queryExecution = QueryExecutionFactory.

sparqlService(endpoint, query);

A pattern is used in order to determine the type of the query - SELECT, ASK,DESCRIBE, CONSTRUCT. Based on the type, different methods are called:

execSelect

execAsk

execDescribe

execConstruct

The ResultSetFormatter’s static method - asXMLString - is used for returningthe results of the SELECT and ASK queries as XML.

ResultSet results = queryExecution.execSelect();

printWriter.write(ResultSetFormatter.asXMLString(results));

The other to interrogation types retrieve the results as a Model object. Theseobjects write directly to the PrintWriter that communicates with Flex, via their“write” method.

Model model = queryExecution.execDescribe();

model.write(printWriter);

Conclusion and Perspectives

WebSpa is a web based application that manages SPARQL endpoints and allowsusers to perform interrogations over them. It provides pre-defined SPARQL end-points and supports the addition of new ones. Anyone can use the application,but an user account is required in order to performs certain operations, such asadding endpoints, saving and later editing a query. The results are displayed inthe browser and they can also be stored locally, in XML format.

The application is written in both Java and Flex. It uses Jena and ARQ ap-plication programming interface in order to perform the queries, and the resultsare processed and displayed using Flex.

There are two versions of WebSpa - the classic one, which has been presented inthis paper, and the graphical one, which is still under development. The latterallows the user to handle the RDF graph by visualizing its nodes and arches anddirectly manipulating them.

Limitations of the Classic WebSpa refer mainly to the way content is displayed,eg. it might be a little difficult for the users to manually browse to the XML filein order to view the results.

References

[1] Tim Berners-Lee, James Hendler and Ora Lassila. The Semantic Web. ScientificAmerican, May 2001.

[2] W3C. SPARQL Query Language for RDF. http://www.w3.org/TR/rdf-sparql-query/

[3] Jena Project Page. http://jena.sourceforge.net/[4] W3C. SPARQL Update. http://www.w3.org/Submission/SPARQL-Update/[5] W3C. RDQL - A Query Language for RDF.

http://www.w3.org/Submission/RDQL/[6] Adobe Flex 3. http://www.adobe.com/ro/products/flex/[7] ARQ - A SPARQL Processor for Jena.

http://jena.sourceforge.net/ARQ/[7] onica Macoveiciuc, Constantin Stan. Flex Framewoek Presentation.

Web Spa

Education