-
1
This Report is submitted to the University of Strathclyde in
partial fulfilment of the
Regulations for the Degree of MSc in Information Technology
Systems
Decision system for stock market investors
Michael Witty
200590826
Supervised by: Dr Ian Ruthven.
Department of: Computer & Information Sciences.
September 2007
Except where otherwise expressly indicated the work reported in
this document is my
own. It has been performed during, and has not been submitted
for assessment in
connection with any other award whatsoever.
Signed Date
-
2
Abstract
The aim of this research is to analyse how investors collect and
use typical market
indicators and investigate the ways in which current technology
can be used to enable
more informed time critical decisions.
It has been the Holy Grail of mathematicians, economists,
investment banks and
programmers alike to try and create systems and techniques which
accurately predict
stock market movements in order to ensure financial gain and
eliminate risk. The
findings of such research however are inconclusive as to weather
the stock market can
be predicted accurately enough to make significant profits.
The efficient market theory concludes that the market already
reflects the value of an
investment since all relevant information is currently in the
public domain, this
conclusion is increasingly being challenged as more complex
computer systems are
directed towards the field, and the vast repositories of data
available on the internet
grow.
This research focuses not on predicting the market but providing
better tools to help
investors make decisions based on the current market conditions.
Recent standards and
technologies such as XML and Web 2.0 have provided solutions to
some of the common
problems of data retrieval, representation, organization and
manipulation. This research
looks at the use of such technologies.
Acknowledgements
I would like to thank Dr Ian Ruthven for his help and guidance
during this project.
-
3
Table of Contents
1. Introduction
...................................................................................................
5
1.1. Problem Statement
................................................................................
5
1.2. Overview
..............................................................................................
5
1.3. Scope
..................................................................................................
5
2. Literary Review
..............................................................................................
6
2.1. The Stock Market
..................................................................................
6
2.1.1 Fundamentals
...................................................................................
6
2.1.2 Predictability
.....................................................................................
7
2.1.3 The Efficient Market Hypothesis
........................................................... 7
2.1.4 The Rise in On-line Trading
.................................................................
8
2.2. Information Visualisation
........................................................................
8
2.2.1 Existing Graphical Representations
...................................................... 9
2.2.2 Issues
............................................................................................
12
2.2.3 Cognitive Maps
................................................................................
13
2.3. Functional View of Market Trading
......................................................... 13
2.3.1 Investor
Goals.................................................................................
14
2.3.2 MSN Research Wizard
......................................................................
14
2.3.3 Functional Requirements
..................................................................
15
2.4. Data Retrieval
.....................................................................................
17
2.4.1 Web Content Mining
.........................................................................
17
2.4.2 Extraction Techniques
......................................................................
17
2.4.3 Examples
.......................................................................................
18
2.4.4 Dapper.Net
.....................................................................................
21
2.5. Data Storage
......................................................................................
22
2.5.1 XML
...............................................................................................
22
2.5.2 XPath
.............................................................................................
23
2.5.3 Storing XML
....................................................................................
24
2.5.4 XML Databases
................................................................................
24
2.6. Transforming XML
...............................................................................
25
2.6.1
XSL................................................................................................
25
2.6.2 SVG
...............................................................................................
27
2.6.3 XForms
..........................................................................................
28
3. Specification
................................................................................................
30
3.1. Problem Statement
..............................................................................
30
3.2. Stakeholder Analysis
............................................................................
30
-
4
3.3. User Goals
..........................................................................................
31
3.4. Use
Cases...........................................................................................
32
4. System Design
.............................................................................................
32
4.1. Methodology
.......................................................................................
32
4.2. Required Technologies
.........................................................................
32
4.3. Decisions
............................................................................................
33
4.4. Modules
.............................................................................................
33
4.4.1 Dapper Module
................................................................................
34
4.4.2 Database Connection Module
............................................................ 35
4.5. Proposed Architecture
..........................................................................
36
5. Implementation
............................................................................................
36
5.1. Issues and Design Changes
..................................................................
36
5.2. Module implementation
........................................................................
37
5.2.1 Utilities Package
..............................................................................
37
5.2.2 Dapp Manager Package
....................................................................
38
5.2.3 DBQuery Package
............................................................................
40
5.2.4 Style sheet & Icon Design
.................................................................
41
5.3. Final System Architecture
.....................................................................
44
5.4. Interfaces
...........................................................................................
44
6. Testing
........................................................................................................
46
6.1. Component Testing
..............................................................................
46
6.2. Usability testing
..................................................................................
47
6.3. Speed & Accuracy Test
.........................................................................
48
6.4. Results
...............................................................................................
51
7. Conclusion
...................................................................................................
52
8.
Bibliography.................................................................................................
53
-
5
1. Introduction
1.1. Problem Statement
For Investors, making financial gains requires quick and
informed decision-making based
on many different sources of time sensitive data. News articles,
company profiles,
financial indicators and general economic conditions are
constantly changing and all can
affect the performance and financial well being of a company and
therefore its stock
price.
The process of gathering and analysing this information is time
consuming. For investors
the cost of their time and charges incurred through buying and
selling shares
immediately reduces potentials gains.
Another consideration is the time dependant nature of trading,
this means that not only
do investors have to collect and analyse the relevant data they
must also make a
decision based on this data while it is still relevant. Although
many sites or portals exist
specifically aiming to provide a single source for investors,
these sites themselves
become vast and are still predominantly text based.
The task for investors is then not only to discover the relevant
information but also
determine some meaning from it.
1.2. Overview
This paper aims to research and develop a graphical display
method to present a holistic
view of a market index, specifically the FTSE 100. The intention
being that investors can
narrow their research efforts by filtering which companies are
worth investigating further
and ultimately help make decisions whether to buy, sell or hold
a particular stock. A brief
overview will be given of the stock market and a functional
analysis of the trading
process. An analysis of existing systems and technologies will
be used to develop a
graphical tool for traders to gain a quick insight into the
market.
1.3. Scope
Some assumptions are made about the users of the final system,
most financial sites
offer tutorials and helpers to aid first time investors on the
intricacies of the stock
market, the assumption has been made that the users and testers
of this system where
necessary will have relevant knowledge and experience to do so,
and that the writing of
-
6
such educational material into the final site is beyond the
scope of this project. The final
system is also intended to provide proof of concept and as such
will demonstrate some
possibilities in terms of retrieving data, however all possible
scenarios will not be
implemented, with a view that the system is versatile enough to
retrieve data from
many diverse locations assuming the user provides appropriate
configuration.
2. Literary Review
First an understanding of the stock market and available
technologies is required to
assess high-level functional requirements and identify
technologies, which can answer
these requirements.
2.1. The Stock Market
Stock is the term for the outstanding capital of a company or
corporation, this stock is
divided up into shares which are traded on an exchange in a
similar way to an auction,
the difference being that in a stock market sellers and buyers
do not make trades on a
highest or lowest offer wins basis, instead they are matched
based on the price they are
willing to trade at.
2.1.1 Fundamentals
Many exchanges exist globally, among these the major ones
include the New York Stock
Exchange (NYSE) in America, the Nikkei in Japan and in the UK
the London Stock
Exchange (LSE).
Each exchange consists of lists or indexes of companies grouped
by market capitalization
(the estimated total value of a company). If a company is listed
on a particular index
investors can gauge how large the company is in terms of its
financial value. This project
is concerned with the FTSE 100, which lists the top UK companies
traded on the London
Stock Exchange.
The price at which shares are bought and sold is governed by
many factors, the price of
a stock can be thought of as a reflection of what the market is
willing to pay. Expressed
another way it can be said that the market price is a reflection
of the perceived value of
a company, this value changes over time to reflect the company’s
financial performance
and well-being. As Elinger observes the market is searching for
the right price1
1 The Art of Investment Elinger, A.
-
7
2.1.2 Predictability
The predictability of financial markets has engaged the
attention of market professionals
& academic economists & statisticians for many years
2
Being able to predict how a market or individual share is going
to behave in the future
would be of great advantage to any investor giving them a
guaranteed profit on any
investments they make. As such this is exactly what many
investors try to do.
Several methods and techniques exist from fundamental analysis
to technical charting.
The effectiveness of such techniques is always being debated and
indeed whether or not
it is in fact possible to predict market movements with any
degree of accuracy. Some
studies and theories challenge the reasoning behind such
pursuits one notable study is
the efficient market hypothesis.
2.1.3 The Efficient Market Hypothesis
Malkiel3 first proposed the Efficient Market Hypothesis in 1973.
The findings of his
research suggest that the market cannot be predicted using any
of the formal techniques
such as fundamentals and technical analysis. These methods rely
on quantitative data
about companies and trade information such as prices and volume,
which are freely
accessible in the public domain.
It is proposed that since this information is already freely
available to all investors the
market already reflects any implications of this information.
The debate continues
between promoters of the EMH and the more traditional technical
analysts, as yet no
solid conclusions have been made either way, and with so much
attention from various
research sources the debate is likely to continue. Recent
advances in techniques,
computing power and larger data sets available via the Internet
have fuelled this debate
further4.
It has therefore become clear that an alternative approach is
required to instead provide
investors with all the information and data they require in such
a way that allows a quick
overview and analysis of market activity to help make investing
decisions, Mills5
proposes that investors need to gather and analyse this
information as soon as it
becomes available so that timely decisions can be made.
2 Predicting the unpredictable Mills, T
3 Random Walk Down Wall Street Malkiel, B
4 Predicting the unpredictable Mills, T
-
8
2.1.4 The Rise in On-line Trading
A number of factors have resulted in on-line trading becoming
more popular in the past
few years, increased availability of data, increase in net
usage, new technology faster
connections and favourable market conditions have made investing
in stock more
attractive6.
This vast increase in on-line trading has given rise to many web
sites offering trading
tools for investors and market data portals.
In addition recent standards and technologies such as XML and
Web 2.0 have enabled
richer web based applications including the use of graphics.
2.2. Information Visualisation
Information visualisation is concerned with the representation
of data in a graphical
format, which successfully imparts information to the viewer.
This idea was famously
captured by the proverb ‘a picture is worth a thousand
words’.
Tufte7 takes this concept further by introducing the idea of
data density. Textural based
representations are limited by the viewer’s ability to read and
understand the text itself.
Basic Text can be thought of as one dimensional in its ability
to communicate
information, being the value the characters represent. Graphics
on the other hand can
be used to represent more than one dimension through the use of
colour, size, shape
and context. This ability means that more data can be
represented over a set area.
Harris8 describes how using colour alone can help authors in the
following ways:
Differentiate Elements
Encode areas of equal value
Alert viewer when a predetermined condition occurs
Identify particular values
Indicate similar items
Signify changes in direction, trends conditions
5 Predicting the unpredictable Mills, T
6 Stock Market Psychology, Warneryd, K
7 Envisioning Information Tufte, E
8 Information Graphics Harris,R
-
9
Improve retention of information
Use gradations to indicate transitions from one set of
conditions to another
It can be seen that many of these attributes lend themselves
nicely to the stock market
scenario, particularity in the identification of trends and
changes in direction for
numerical indicators.
2.2.1 Existing Graphical Representations
The idea of representing data using graphics is not new even in
the stock market
scenario; various charts and display methods already exist:
Simple time series: Probably the most synonymous chart with
stock markets is the time
series graph, which simply plots one variable against a set time
period from this an
investor can see how the price has performed historically.
Figure 1 Example of traditional charting on Self Trade.
Candlesticks: Bar Chart/Candlestick – First devised by a
Japanese rice trader the idea of
the candlestick diagram is to show price change over a certain
period in relation to the
-
10
highest and lowest price. Candle sticks are still used today on
many sites such as Digital
Look and Self Trade. They are a good example of how graphics can
be used to store data
in a smaller area. The example below shows that by using a box
and two lines the
diagram can successfully communicate 4 pieces of information to
a user at once. When
combined with a time series chart even more information can be
imparted.
Figure 2 Candlestick Example9
Heat-maps: The concept of the heat map is to display a
particular indicators rate of
change (most commonly the price change over a period) and
communicate this change
graphically by changing the colour of the graphic.
Digital Look10 provides one example of a heat-map currently
available:
9 http://www.babypips.com/school/what_is_a_candlestick.html
10
http://www.digitallook.com/cgi-bin/dlmedia/investing/visual_tools/heat_maps?
http://www.babypips.com/school/what_is_a_candlestick.htmlhttp://www.digitallook.com/cgi-bin/dlmedia/investing/visual_tools/heat_maps?
-
11
Figure 3: Digital Look Heat-Map
MSN11 also provides a similar heat-map display again this
displays the price change for a
certain period.
11
http://msn.moneyam.com/heatmaps/
http://msn.moneyam.com/heatmaps/
-
12
Figure 4: MSN Heat-Map
It can be seen that most of these graphical tools only attempt
to map one variable, and
in all cases it is almost certainly the change in price over a
certain time period.
2.2.2 Issues
Spencer12 observes ‘The mere re-arrangement of how the data is
displayed can lead to a
surprising degree of additional insight’
It is clear that graphics can help however on the converse
Tufte13 also observes that the
incorrect use of graphics can have a negative effect.
Some common errors include the use of irrelevant decoration,
information overload and
negative use of colour. As such the factors must considered when
designing such
interfaces. As a guide the following requirements need to be
addressed:
Selection of Data – Relevant to a task
Representation – How to represent abstract things
Presentation – Spatial Limitations
12
Information Visualisation Spence, R 13
Visual Explanations Tufte, E
-
13
Scale Dimensionality - How many dimensions, variables can be
displayed
Re-arrangement, interaction & exploration
Internalisation – Minds representation of an internal image
Externalisation – Display of what user actually sees, i.e.
computer display
Mental Models – human memory models
Invention, experience & skill
2.2.3 Cognitive Maps
The next consideration in terms of information visualisation is
how the user interacts
with the graphic. The idea of a cognitive map is how the user
constructs a navigational
guide to an interface in memory; a simple real world analogy
would be the London
underground map.
Most passengers on the underground have one goal in mind, which
is how to get from
point A to point B and the required connections between the two.
As such the
underground map uses colour to represent the different
connecting routes and does not
attempt to display any other real world data such as accurate
scales because the user is
not interested in this information.
Another analogy would be to think of cognitive maps as the
bridge between the real
world, the computer display and the users memory14.
The process of creating these maps can be illustrated by the
following sequence:
Browse > CONTENT > model > INTERNAL MODEL >
interpret > INTERPRETAION >
Formulate browsing strategy > BROWSING STRATEGY.
To aid this process the concept of Context maps can be used to
help users create such
models. Such maps aim to give the viewer an basis on which to
build their own cognitive
map.
2.3. Functional View of Market Trading
To gain an understanding of how investors make decisions and the
ways in which this
data is analysed a functional analysis of trading activities is
undertaken.
14
Mental models, Navigation
-
14
2.3.1 Investor Goals
Investors all share a common goal to achieve a return on their
initial investment. On a
very basic level the goal is to always buy when a stock is
undervalued before the market
moves to reflect this, and conversely sell when an investment is
overvalued. Put simply
buy low and sell high.
The methods used to achieve this will vary from person to
person. Individual goals and
strategies will differ between individual personalities and age
groups. Investors can
however be grouped into two general categories as either active
or passive traders, also
known as short and long traders.
Active traders aim to make profit from the short-term natural
fluctuations in price or
volatility. The frequency of these trades varies, the most
extreme example being the day
trader who makes very large trades over short periods to take
advantage of daily
fluctuations in price.
Passive traders in comparison aim to take advantage of the
markets long-term tendency
to increase, they therefore make very infrequent trades and buy
shares periodically to
add to their portfolio as opposed to selling. Most traders
generally fall into the second
category15
2.3.2 MSN Research Wizard
MSN Research wizard16 gives a good indication of what is
involved when deciding to sell
or buy shares. The page is a kind of expert system using MSN
data to guide an investor
through the process of assessing an individual company. The
wizard looks mainly at
fundamental data to gauge how good an investment is.
The wizard is split in to 5 main sections. The first step looks
at the company’s
fundamentals; a set of indicators used to assess a companies
financial well-being.
Fundamentals can be used to determine how profitable a company
has been to date and,
as well as giving an idea of the general state of their
finances. The kinds of question it
aims to answer include:
How much does the company sell and earn (sales & income)
15
Stock Market Psychology, Warneryd, K 16
http://uk.moneycentral.msn.com/investor/research/wizards/srw.asp?Symbol=GB%3Abp%2E
http://uk.moneycentral.msn.com/investor/research/wizards/srw.asp?Symbol=GB%3Abp%2E
-
15
How fast is the company growing (sales growth & income
growth compared to
industry)
How profitable is the company (profit compared to industry 1yr
& 5yr)
How is the companies financial health (debt/equity ratio
compared to industry)
Some investors use a company’s past price performance as an
indication of future
performance, many will argue that past prices have no bearing on
future prices, likewise
some will argue that a company that has performed well to date
should perform well in
the future. As such this page basically gives an overview of the
stocks performance
measured as price change over the past 1, 3 and 12 months.
Following on from the fundamentals the next section looks at the
likely future price of
the investment. Using a company’s profits to earnings ratio
along with analyst
expectations an estimate of how the company is likely to perform
over the coming 2
years is given.
A company’s share price can be affected by a number of social
factors, such as news
stories relating not only to the company itself but general
economic conditions. An
extreme example of this is demonstrated by the Northern Rock
bank crisis17, which saw
the share price lose 30% of its value overnight. This dramatic
drop in price was initiated
after it emerged the company had sought a loan from the Bank of
England as a result of
difficult financial conditions. Despite the fact the fundamental
business was sound the
panic that ensued as customers withdrew savings caused the
market price to freefall.
Recognising the importance of financial news MSN have added in a
catalysts section to
the wizard, which details any company specific news stories that
could impair or improve
confidence in the company.
Finally another predominant task in the decision process is
considered, which is
comparison. Looking at a single company profile can only impart
information in a single
context. To get meaning from this data a comparison is required,
in this case MSN allows
comparative analysis with up to two other company profiles.
2.3.3 Functional Requirements
17
http://news.bbc.co.uk/1/hi/business/7007076.stm
http://news.bbc.co.uk/1/hi/business/7007076.stm
-
16
From our initial investigation it is clear that in terms of
making wise investments
knowledge is key. As J.K Lassers18 observes of Warren Buffet;
one of Americas most
successful investors:
He will seek out every last bit of information he can get,
whether it’s a company’s return
on equity or the fact that the CEO is a miser who takes after
Ebenezer Scrooge himself.
Using the MSN wizard as a guide the functional tasks can be
broken down as follows:
Determine profitability of a company
Determine return on investment
Determine the risk of the investment
Determine the value of the company
An insight in exactly how the data is analysed can also be
gained. It can be seen that
most numerical indicators are analysed in the following ways
Value in relation to highs and lows
Value in comparison with a base value such as market or
sector
Difference between two values, spreads, rate of change.
Trends and direction.
Identification of changes in trend, turning points.
The main functional requirements can be grouped into two main
categories:
Our first main functional requirement is therefore the retrieval
and storage of data from
the World Wide Web for analysis.
Secondly to make decisions the data must be analysed, this will
involve some or all of
the tasks described in the previous section, which investors
already perform on the
various sources available. A graphical interface is proposed
which will allow users to
explore and display the retrieved data in different ways to gain
a better understanding of
its meaning.
Each top-level requirement is investigated in turn to generate
lower level requirements:
18
Pick Stocks Like Warren Buffet Lassers, J K
-
17
2.4. Data Retrieval
There is a wealth of information available to the investor via
the modern Internet. As
such many companies have emerged which aim to provide content to
investors for
analysis, sites such as MSN money19, Digital Look20 and Self
Trade21. As we have seen
however relevant information can come from a wide range of
sources. To access all
these resources manually involves searching and browsing for
content. Even with a
comprehensive bookmark list of sites, this activity is time
consuming and laborious.
There is a requirement therefore to programmatically extract and
consolidate this
information.
2.4.1 Web Content Mining
Web content mining is concerned with discovering information
from the many sources
available on the web22. Using data mining techniques content can
be analysed and
extracted for use in other applications.
One problem with using such a vast data repository such as the
Internet is the dynamic
nature of the content. In order to retrieve data in any
circumstance an application needs
to know where to look and a reference of what it is looking for.
In the context of the
World Wide Web we are dealing with pages of content which can be
written in a range of
formats; ASP, JSP and HTML may change in structure at any time
and may not following
strict rules associated with mark up languages.
A further complication is the fact that HTML generally doesn’t
contain any type
information and content will almost always be represented as a
generic string type. This
poses issues when trying to extract useful information, which
will be used by another
program that is strongly typed such as Java.
Luckily despite these issues there are techniques and programs,
which solve these
problems:
2.4.2 Extraction Techniques
A basic technique for retrieving web-based content is the
concept of Screen Scraping.
Screen scraping involves extracting data from its final output
format, usually the visual
19
http://money.uk.msn.com/ 20
http://www.digitallook.com 21
http://www.selftrade.co.uk/
-
18
display of the program being scraped. In the context of the web
this would involve
taking content from the browser directly. This can be achieved
by a number of methods
such as regular expressions or dedicated API's. This technique
has limitations however,
because the data being extracted is taken from a format which
has human readability in
mind, additional processing is required to remove styling
elements. The data itself will
not necessarily be structured in a suitable way for use by other
programs and as such
requires contextual information added later.
Tree Builders are aimed specifically at web page extraction and
take advantage of the
mark up languages structure. A tree builder will attempt to
create a tree representation
of a web page in memory by matching start and end tags in the
target document. The
program will then build a representation of the structure in
order to provide a navigable
context. The designer of the particular extraction program will
dictate the way in which
the tree is built and how extensively it caters for specific tag
libraries. Once a tree
representation has been created data can be extracted based on
its location in a
document. This method is useful for retrieving data from many
pages, which have
identical layouts for different content, such as stock prices,
but can only work with
supported formats.
W3 introduced the Document Object Model23 or DOM to address
these issues, in their
own words:
‘The Document Object Model is a platform- and language-neutral
interface that will allow
programs and scripts to dynamically access and update the
content’
The introduction of this standard meant that API and program
writers had a common
interface to work from. As such parsers can take the tree
builder concept to the next
level by building a DOM representation of the page in order to
extract its content.
2.4.3 Examples
Implementation of a fully-fledged extraction program is time
consuming and not the
main focus of this project; there are many freely available
programs for this task, two
notable online examples being Yahoo pipes24 and Dapper25:
Yahoo pipes is a web 2.0 application available exclusively
on-line, it relies on structured
data in the form of XML, RSS feeds and JSON as a target content
type. The site consists
22
Web Content Mining with Java, Loton, T 23
http://www.w3.org/DOM 24
http://pipes.yahoo.com/pipes/ 25
http://www.dapper.net/
http://www.w3.org/DOMhttp://pipes.yahoo.com/pipes/http://www.dapper.net/
-
19
of a graphical interface in which users add modules and connect
them to create a
customized output from existing web pages.
The modules themselves perform various tasks affording the users
control over the data
retrieved from selected URL’s. The output is then displayed as a
standard html page,
which can be viewed by anyone who logs into the site.
Example creating a simple RSS feed aggregator.
The “Fetch Feed” module is used to retrieve news stories from
the BBC's business feed,
this is simply connected to the output module.
Figure 5 Simple feed to retrieve RSS from the BBC
Multiple feeds can be combined in the fetch feed module, a
filter module is added to
allow users to search the feeds for specific terms. The search
module is added to provide
a user input on the main page.
-
20
Figure 6 Simple Aggregator to combine two feeds
A search term box is added in the above example to filter only
news items of interest
from the 3 selected news feeds.
Figure 7 Output page for the aggregator with search term
box.
-
21
Other modules can be used to create more complex pipes, XML data
can be extracted
directly and manipulated, filtered or combined with other web
sources to create useful
pages. However the application is limited to use with live data
and the output is
restricted to the standard output, in addition few sources of
useful data are freely
available in XML format.
2.4.4 Dapper.Net
Dapper (concatenation of Data Mapper) is another online
application that allows users to
extract content from anywhere on the net and output it into
various formats including
XML, JSON, RSS feeds etc. Dapper also provides a Java API
allowing developers to
connect their programs with dapper to retrieve the extracted
content.
Dapp’s are small retrieval applications created using the main
site. Each Dapp is created
to parse a specific web page. Initially this is achieved via a
virtual browser within the
site. The user interface allows web content to be selected for
retrieval. In the example
below the last trade price element is selected. Each selected
element can have some
basic manipulation to remove preceding or tailing strings in
this case the p is removed.
Figure 8 Dapper UI showing selected content
Any number of elements can be added. Once the content has been
selected the user can
add field names and the output grouped. These are reflected in
the resulting XML output.
-
22
Figure 9 Preview showing output
Dapper is flexible enough to allow modifications to the content
which is retrieved at a
later date. The addition of the Java API allowing external
programs to interface with
Dapper makes it an ideal solution to the retrieval problem.
2.5. Data Storage
The second top-level requirement of the proposed design is the
storage of the data
retrieved by Dapper. The output format from dapper is selected
when the Dapp is
created and the user has several options including RSS feeds
JSON and standard HTML.
Since we are using the data in another application it makes
sense to retrieve the data as
XML:
2.5.1 XML
XML is a standard for data exchange and has become popular for
use in desktop
applications for configuration files, as well as on the web to
store and exchange data.
XML can be thought of as data about data, in that not only does
it contain the actual
data but also contextual and structural information.
XML has many advantages; firstly it’s high portability between
applications and cross
platform the fact that it has been a W3 standard since 1998
means a lot of applications
and application interfaces are available. For the example Dapp
that we created in the
-
23
previous section the XML output would look as follows (the
actual output has been
simplified to show only the elements of interest).
MSNPriceData
http://uk.moneycentral.msn.com/investor/quotes/quotes.asp?symbol=GB:BP.L
1.233
2007-07-29 15:59:25
605.00
Although there is only actually one piece of data, being the
last price the Dapp gives lots
of other information within the XML document such as the source
of the data, when it
was accessed and the name of the Dapp that accessed it.
The structure of XML is strict in that every start tag must have
a corresponding end tag
and each document must have a single root element. In this
example it can be seen that
the tag is the root and all the other tags are nested within it.
This
characteristic allows logical grouping of elements in
hierarchies.
2.5.2 XPath
XPath is a query language that enables the inspection of XML
files. The language is a W3
standard and works on a hierarchical basis similar to a file
system. An XPath navigates
through the document structure to a particular node or set of
nodes depending on how
far down the tree the path goes. This adds an interesting
capability to XML documents in
that they can be treated as a very simple database provided an
XPath interface is
available.
In the above example we consider the following XPath
expression:
//PriceData/last
The double slash at the start tells the path to start from the
root node: elements the
following expressions tell it to first navigate to the element
which is a child
of and then to the element which is in turn a child of
the result would then be 605.00; the content of our element.
-
24
2.5.3 Storing XML
Using XML on its own cannot provide a solution which will fully
replace a relational
database; although in theory the data could be extracted and
continually added to one
large XML file the problems of organization, persistence,
availability, security, efficient
search and update still exist.
There is a need therefore to use a RDBMS to store XML data, a
number of possible
solutions are available. One solution involves storing XML files
directly as a file within the
database, however this solution disregards the logical structure
of the XML files when
performing queries on the resulting table.
Another solution would be to create further reference tables to
store some of the more
important structural information about a document, which can
then be queried. This case
will not cope well with changes to document structure since the
underlying tables will
need to be updated to reflect such changes.
Therefore to gain the full advantage of XML the document would
need to be decomposed
before insertion into the database and then recompiled when it
is extracted. XML
schemas could also be used to ensure the structure is
maintained. Although the
database can now provide the same level of logical information
as the original document
there are performance ramifications.
2.5.4 XML Databases
XML databases aim to give the best of both worlds. A native XML
database allows the
storage of individual documents in collections, which can be
queried and updated using
XPath and Xupdate; another standard for performing updates on
xml. Collections are
more versatile than a traditional RDBMS in that they can store a
set of generic XML
documents regardless of weather they contain the same structure.
Collections can also
be stored within collection to provide further levels of
grouping and allow queries on
multiple sources.
Apache Xindice is a Java implementation of a native XML database
according to
XMLdb.org specifications. Xindice runs as a web application in a
suitable container such
as Tomcat, the way in which the database is access and added to
is up to the designer of
the application, since Xindice is Java based there is a
substantial API to support most of
its functions although it is possible to control via a command
line interface.
-
25
Because it is packages as a web app collections can be viewed
via a web browser:
Figure 10 Xindice debug tool showing a collection of XML
files
Xindice nicely answers our second requirement to store our
retrieved data, since this is
already in XML format courtesy of Dapper. It also means we don’t
need to worry about
tailoring for changes in incoming data’s structure and a handy
interface is provided to
check up on the collections.
2.6. Transforming XML
The final requirement is to represent the retrieved data in a
graphical format, again W3
and XML standards provide the answer. Two standards exist which
can address the
problem: XSL and SVG.
2.6.1 XSL
XSL stands for Extensible Style sheet Language. XSL is to XML
what CSS is to HTML. W3
continues its mission to separate data from presentation by
introducing XML style sheets
or XSTL for short. XSL allows designers to dynamically change
the representation of XML
data into other formats such as HTML and SVG.
Using our example output file from before we add an extra line
to reference the style
sheet:
-
26
MSNPriceData
http://uk.moneycentral.msn.com/investor/quotes/quotes.asp?symbol=GB:BP.L
1.233
2007-07-29 15:59:25
605.00
In this case we want to simply display this data in a HTML file
along with some other
information, the resulting style sheet would look as
follows:
Latest Price
Latest Stock Price :
The style sheet uses our XPath expression to reference the
content to display, the
resulting HTML file looks as follows:
Figure 11 Result of XSL transform in Firefox
-
27
The advantage of this is the separation of data from
presentation, we could use the
same style sheet over and over again to display the price of
different stocks.
XSL requires a parser to transform XML data, most browsers
support this as standard so
that XML can be styled on the client side to provide the desired
result in the case of
Firefox Expat is used. It is also possible to style the data on
the server side using a third
party parser such as Apache Xalan before passing the resulting
transformed document to
the client, in which case they would simply receive the HTML
representation.
2.6.2 SVG
SVG stands for Scalable Vector Graphics26 another W3 standard,
which extends XML to a
graphical format. SVG aims to address some of the current issues
with web-based
images such as file size and varying screen resolutions. Vector
Graphics are images
generated from a series of vectors drawn between defined
co-ordinates. The relevant
data required to draw the image is stored as XML mark-up using
SVG tags. One
advantage of this format is the ability to scale the image
without loss of quality or
pixelation. One draw back to this technology is the need for a
plug-in to be installed
within the client browser, although Firefox and Opera support
SVG as standard IE still
requires the Adobe plug-in.
A further advantage being that SVG is part of the W3
recommendation so it can be
coupled with XSL to generate graphics from XML making it ideal
for representing
numerical data graphically.
Again looking at our previous example we use the same XML/XSL
combination to draw a
simple box that represents the price of a security. The XSL to
achieve this would be as
follows:
26 http://www.w3.org/TR/SVG11/
http://www.w3.org/TR/SVG11/
-
28
Resulting SVG output, albeit not very interesting the dimensions
of the box in this case
are determined by the stock price divided by 10:
Figure 12 Simple box representation of a stock price
The XSL is slightly different in this case because we need to
use our data as a value
within the SVG mark-up to do this an XSL variable can be used to
temporarily store the
data so it can be used in the transform.
The use of XSL XML and XSL is a nice set of standards, which
answer our data
presentation problem, once a suitable XSL template is created it
can be reused wherever
required.
2.6.3 XForms
Xforms (short for XML forms) is one of the latest standards from
W3 pitched in there
own words as the latest generation of Web Forms to replace the
outdated HTML form27.
Xforms aim to make the task of creating web forms easier with
many of the standard
tasks involved in such an exercise incorporated into the
specification; retrieving and
saving data from local files, validation of user inputs and
dynamic content are just a few
examples. One of the main advantages of Xforms however is the
ability to access and
update XML content and provide logical bindings between data,
even in separate XML
files. Xforms also aim to provide a better user experience with
some AJAX like
functionality built in.
Xforms are written in XML using Xform tags they access content
in other XML files using
the concept of bindings along with XPath to navigate the
documents. A further benefit is
the ability to make asynchronous submissions from the form
without any laborious
Javascript.
27
http://www.w3.org/TR/xforms/
http://www.w3.org/TR/xforms/
-
29
The following example illustrates a simple Xform:
The XML file from the previous example is used to write an Xform
to give a user access
the data:
My First Xform
My XML Data
Data:
Save
The above form will appear as follows in an Xform enabled
browser:
Figure 13 Our price data from before appears but can now be
edited.
With Xforms the designer defines a model representation of the
data, this can be
programmed directly into the form or referenced from an external
file as in the above
-
30
example. Here the submission element tells the forms processor
to save the file to the
local file system as data.xml.
A further advantage of Xforms is the fact that it can access any
other XML standard
mark-up such as XSL. Coupling these two standards together
enables the form to not
only access XML data but also manipulate XSL files and therefore
change the resulting
SVG output.
3. Specification
Given our initial requirements have been identified and relevant
technologies researched
the problem can be assessed in more detail, to generate further
requirements the
problem statement can be updated and stakeholder analysis
revisited.
3.1. Problem Statement
The initial problem statement was the retrieval and
representation of data from the
Internet. We now have to consider how this will be solved using
the technologies
identified. Specifically the inclusion of Dapper adds additional
functions and requirements
from the point of view of administering and running the
retrieval.
3.2. Stakeholder Analysis
The initial stakeholder analysis identifies the primary,
secondary and tertiary
stakeholders:
Primary Stakeholders: Administrators
User Profile: The administrators could also be private
investors. At present the
assumption is made that some external management is required for
the site
whether this is by the investor using the site or a third
party.
Role: Ensure errors caused by external factors such as server
downtime, changes
to site structure are dealt with, input will be required to
respond to such problems
and update Dapp’s as necessary.
Goals: Browse web sources for relevant information. Identify
information, which
is of interest. Manage and update Dapp's. Build up and maintain
collections of
data sources. Schedule tasks for Dapp's to perform. Manage
collected data.
Secondary Stakeholders: Investors and End Users
-
31
User Profile: Investors and front end users who will access the
date retrieved via
the graphical interface.
Role: The data that is retrieved and the format that it is
eventually stored in will
affect the people who use that data. Investors want information
as soon as it is
available and spending time searching for this information is
costly both in terms
of investors time but also in their ability to make informed
decisions.
Goals: Gain an overview of all relevant information relating to
current or future
potential investments. Select the output format for the data.
Filter and Search
data. Extend administrator goals.
Tertiary Stakeholders: Content Owners
User Profile: Web masters and web content owners
Role: Maintaining web pages and content
Goals: Attract users to their sites and in some cases generate
revenue through
advertising or subscription
3.3. User Goals
High-level goals identified from problem statement and
stakeholder analysis are used to
define top-level use cases.
Manage and update Dapp's.
Build up and maintain collections of data sources.
Manage collected data.
Select different views of the data.
Filter and Search data.
-
32
3.4. Use Cases
Figure 14 Use Case Diagram
4. System Design
4.1. Methodology
The Design Methodology used is a top down modular approach to
development. Starting
with high-level use cases the interfaces and main functionality
are determined; from
here functional requirements are elicited as separate modules
based on their intended
tasks. The previous sections have outlined the various
specifications available to answer
our three top-level requirements, Data retrieval, storage and
transformation. These
standards follow a strict Model View Controller paradigm as such
it makes sense to
extend this to the whole application.
4.2. Required Technologies
-
33
Before development some base technologies are required to
support the system. Xindice
runs as a web application on a suitable container, in this case
Apache Tomcat is chosen.
Once Xindice has been downloaded and unpacked it is deployed to
Tomcat and tested
using the appropriate URL, in this case:
http://localhost:8282/xindice/?/db.
The top level collection in Xindice is called db, the question
mark indicates the debug
page which is automatically loaded when Xindice is accessed
using the base URL. This is
the only user interface provided as standard for Xindice. XML
files can be viewed via this
tool but not added or manipulated.
Xforms and SVG cannot be viewed on all browsers by default.
Extensions are required
for most to support these standards and the level of
functionality supported differs
between implementation. As such Firefox is chosen since it
provides good support for
SVG and the Mozilla Xforms extension implements most of the
Xforms 1.0 functionality
despite still being in a development stage.
To aid development the eclipse IDE is used since it supports
most of the standards used
with the exception of Xfoms and SVG. Firefox provides an error
console, which is useful
for debugging XML content as such it can provide useful feedback
on SVG, XML, XSL and
Xform errors.
4.3. Decisions
Although much of the necessary functionality can be implemented
on the client side
using browser extensions, a back end is still required to
interface with Dapper and
Xindice.
Java Servlets were chosen to address this requirement partly due
to the Java API
support for both Xindice and Dapper but also because java has
plenty of XML and DOM
API’s to allow XML data handling. Xforms can post data as XML
files direct to the server
to handle these files the server side application must therefore
be able to access and
manipulate XML.
4.4. Modules
To simplify the design process the application is split into
smaller modules each
addressing a specific function. Keeping with the grouping used
so far the main functions
are data retrieval, storage and presentation.
-
34
4.4.1 Dapper Module
Much of the data retrieval requirement has been addressed by
Dapper, however the
Dapper.net provides a means to create the Dapps but not control
their execution. The
Dapps themselves only have the ability to execute for one URL at
a time, our
requirement is to extract data from different sources but also
for multiple pages in the
same resource, this involves specifying parameters directly
within the URL.
Our first requirement is therefore a means to execute a Dapp and
specify a URL for it to
work on. From our Use cases we also have the requirement to
maintain collections of
resources for the Dapp to retrieve from. Finally there are two
requirements to select the
Dapp to be used and specify the storage location for the
data.
The Dapper API unfortunately is very basic and looks incomplete.
As such the interface
options for Java are limited so much of the above functionality
needed to be
implemented.
Using our Model View Controller ideal the functionality is
divided, first implementing the
data aspect of our module an XML file is created to store the
Model view of our Dapp.
Unfortunately the Dapper API does not provide an obvious means
to elicit certain
parameters from the site. The model will therefore be a means to
represent each Dapp.
From the initial requirements the following information needs to
be stored:
Dapp Name
Storage Location
Collection of resource locations (URLs)
Using XML as a storage medium in this way not only makes sense
because XML support
is required for the other aspects of the design so extra effort
is saved on implementing
another means to store the configuration data.
The View aspect will be taken care of by Xforms again to take
advantage of the XML
standards and functionality on the client side. With Xforms
users can manipulate the XML
files to address our requirements to add and update lists of
resources,
The above will provide a nice interface for some XML but won’t
actually do anything so
the controller aspect is required. Since the storage medium for
the retrieved data is
Xindice, which needs to run on Tomcat, a servlet container, it
is logical to use Java
servlets for our back end functionality.
-
35
4.4.2 Database Connection Module
To access Xindice methods are required to first of all connect
to the collection and
perform queries on the data, again a servlet module will be used
for the controller
aspect. The data is stored in collections within the database,
the top-level collection db
contains database specific files such as Meta information and
should not be used to store
content, as such collections need to be created. Our first set
of requirements is therefore
to provide the ability for users to create collections within
Xindice.
Once data is retrieved by Dapper it will need to be inserted
into a specific collection,
although the data retrieval task is handled by the Dapper module
the retrieved data will
be passed to the database connection for insertion. Xindice
allows the programmer to
specify a unique id for the document being inserted into the
database, however since we
will be querying the XML content directly using Xpath having a
suitable system for
identifying documents by their id is not required. In addition
Xindice has a mechanism in
place to automatically assign unique ids to files as they are
added which saves some
development work.
Finally a requirement exists to query the collections, the
Xindice API provides a query
engine, which accepts an XPath string as input, the issue will
therefore be to provide a
suitable interface to the user that can be translated to an
Xpath query whilst being user
friendly.
To provide the XSL functions a server side function is proposed
to work with the other
servlets. A third party parser is required such as Apache Xalan
to achieve this. Although
the browser can take care of processing XSL some extension
functions may be required
to provide more robust support for numeric processing, many
extension libraries exist
which can be used for this purpose.
From these initial design considerations a conceptual
architecture was drawn up showing
the relationship between the various components.
-
36
4.5. Proposed Architecture
Figure 15: Proposed System Architecture
5. Implementation
The implementation approach was again top down, first of all
creating the user interfaces
to address user requirements then developing the Java code to
accommodate the
intended functions. The final implementation differed slightly
from the original design
concept as problems and improvements were discovered in through
the implementation
phase of the project.
5.1. Issues and Design Changes
In the initial concept a couple of changes were made: firstly it
was initially envisioned
that the final system would behave much like a real world web
application with login
details and user specific preferences. It was decide however
that this kind of
-
37
functionality did not add any major benefit to the project nor
did it help achieve the
initial goals.
The second change was to the retrieval aspect, it can be seen
from the conceptual
architecture that the intention was to allow XSL transformations
to be made on the data
before being inserted into the database; unnecessary data could
be removed and
additional information added to improve document retrieval. It
was later decided to drop
this function since the benefits would be minimal.
5.2. Module implementation
As with the overall architecture some changes were made during
the development
process to accommodate new information as it became available.
The implementation of
each module is discussed in detail.
5.2.1 Utilities Package
The utilities package was added to provide some basic functions
to each of the other
modules rather than repeating code. Two core functions that both
the database query
and dapp manager classes would require was the ability to access
Xindice and
manipulate XML documents using DOM4J.
The Database Connector class provides basic database functions
such as connection,
collection discovery, insertion and retrieval of documents.
Queries are also execute via
the Database connector by passing an XPath string expression to
the executeQuery()
method. Although no DOM standard implementation is favoured by
any of the APIs
DOM4J was chosen because of the range of available functions.
Within all the modules
-
38
XML files are manipulated or passed as DOM4J implementations of
the Document
interface.
There were few issues with the database connector because much
of the functionality is
available via the Xindice API and little additional
functionality had to be coded.
The XML helper class was implemented to carry out the XML
document processing which
became a common requirement between classes. The class handles
saving, reading and
converting XML between formats.
During the development process it became clear that the generic
typing of the retrieved
data by Dapper was going to cause problems with XSL. Some of the
SVG transforms
required numerical data without any formatting information
included. For example 1000
is represented as 1,000. To ensure the data retrieved is
suitable for use with XSL,
regular expressions and additional data validation had to be
added. As such the retrieval
process became more complicated. The solution was to add a user
defined content type
field to the admin page so that users could specify what kind of
data they were
expecting to retrieve. The selection is used to perform regular
expressions on the input
strings to ensure the data will work with XSL.
Another feature of Xindice is that it can be used to update the
XML files contained within
a collection using XUpdate, it would have been more elegant to
store the relevant
configuration files in a separate user collection within
Xindice. The issue with doing this
is that frequent database queries would need to be made because
of the dependency
between Xforms and the XML files. As such it was decided to keep
the files static on the
server and manipulate the documents using the XMLHelper class as
such methods were
added to add and remove nodes sets from the document.
5.2.2 Dapp Manager Package
-
39
Despite the functionality provided by Dapper the execution and
management of the
Dapps became more involved that expected. A Dapp implementation
class was required
to store and manage the data sent via Xforms, an additional
URLlist class was
implemented to manage the list of variables being used for
retrieval. Finally a servlet is
used to access the objects.
The XML data submitted by the Xform is used to instantiate the
DappImplementation
and Urllist objects. A base URL and a list of variables is
specified by the user and stored
in the dapp configuration XML file. Once submitted, the URLlist
class is responsible for
generating URLs and keeping track of the current progress.
Replacing a predefined
marker in the base URL with a variable creates the URL as
follows:
http://uk.moneycentral.msn.com/investor/quotes/quotes.asp?Symbol=GB:{var}
becomes
http://uk.moneycentral.msn.com/investor/quotes/quotes.asp?Symbol=GB:TSCO
The URL can now be passed to the Dapp for retrieval, once the
Dapp has executed the
resulting XML file is validated to see if the retrieval was
successful and if the data is valid
against our regular expression list. A copy of the output data
elements is queried during
validation and added to the configuration file, this was not
part of the original
functionality but later added to support the use of regular
expressions for typing data,
the added benefit of this is that it can be used as a list of
query parameters for the
Xforms interface.
-
40
It was the intention to use the URLlist class as a progress
reporter for the Administration
interface. At present a particularly long list of variables will
take a while to execute. It
would be desirable to provide feedback to the user on its
progress, which could be
achieved by a servlet and javascript to periodically query the
list. However this feature
was omitted due to time constraints.
5.2.3 DBQuery Package
The Database Query package contains the classes required for not
only the database
queries but also the XML transforms. The two were packaged
together because they are
both used from the same Xform interface.
The function of sorting and querying the data could be achieved
in a number of ways. It
was decided that the styling function should be kept separate
from the database query
function. One solution would have been to send the result of the
database query direct
to the client with a reference to the relevant style sheet and
allow the users browser to
perform the transform. This solution however means that every
time a user makes a
change to the way in which a document is styled they need to
submit another query to
the database in addition to parsing and styling the output
again. The fact that the client’s
browser only sees the result of the transform prevents
inspection of the output by the
interface, which can aid some of the context information. The
decision was made to keep
the data retrieval and styling tasks separate.
When a query is submitted, the DBQuery servlet builds an Xpath
expression from the
user input and queries a specified collection. The resulting XML
output is not sent to the
client but stored on the server. Once complete a submit is
triggered automatically to the
Data Styler servlet which then transforms the output file into
and SVG document and
again stored the result on the server. The interface reads the
SVG direct from this file
-
41
the advantage of this set-up is that changes to the style sheet
can be performed on the
database output without submitting another query to the
database. The added
advantage is that the database output is now directly accessible
to the Xform provides
additional functionality such as listing available query
parameters.
5.2.4 Style sheet & Icon Design
The style sheet design proved to be the most difficult part of
the implementation. The
task was to provide a set of predefined graphical
representations that users could select
and manipulate via the main interface. As we have seen XSL
provides a mechanism to
use XML data to transform graphics, this can be any SVG
parameter, dimension, colour,
opacity, shape etc.
The concept of an icon is used to represent an individual data
entity, in this example we
are looking at individual shares as extracted from Dapper. The
icon provides a graphical
representation of 1 or more pieces of data. The number of
parameters differs between
each icon, the simpler ones only displaying one piece of
information as a change in one
of the graphical aspects of the design.
The problem with this concept is providing a context for
comparison. Because the data
can be over an infinite range we need some base to compare each
item to. To get round
this problem each variable is presented as a percentage of the
groups maximum. For
example if we want to display the last price the style sheet
first needs to know what the
maximum price is in the data set.
This is achieved through he use of exstl:math a set of extension
functions which can be
used in addition to XSL. In this case the math:max function is
used to determine the
maximum value of an element in a node set. Once this value has
been calculated the
individual elements can be compared to it to work out where they
are placed on the
scale. Since this value can now be calculated as a percentage
the style sheet can
calculate a corresponding percentage of a graphical value. In
figure 16 the opacity of
each box represents each elements last price in relation to the
maximum price of the
data set being viewed.
-
42
Figure 16 simple box icon designs similar to the heat-map
concept.
This example is fairly simple and essentially the same as a
heat-map to provide more
interesting graphics more complex icons needed to be designed
using the same
principle.
To allow representation using different icons without lots of
server side processing the
functionality of Xforms is taken advantage of again. As we have
seen Xforms can access
any XML based content and as such can access and modify XSL. The
basic parameters of
each icon are stored in the style sheet as global parameters. By
implementing a simple
input control with a reference to these parameters the shapes
can be manipulated.
In figure 16 three global parameters are available; zoom,
resolution and text size. The
zoom function simply scales the graphics up by increasing the
relevant dimensions. SVG
proves its usefulness, as the graphics remain crisp no matter
how much a user zooms in.
The text size parameter is self-explanatory although the same
function can be achieved
via most browsers.
Finally the resolution parameter is added as an exaggeration
function. In figure 17 a
slightly more complex graphic is illustrated. This time the icon
displays the difference
between two parameters as a sloping line. On initial testing of
this model it became clear
that for some data sets the differences in slope was negligible
for some stocks, making
-
43
distinction between icons difficult. To address this issue an
exaggeration parameter was
added to allow the user to multiply the slope by a certain
factor making small differences
more visible.
Figure 17 Rate of change icon showing the difference between two
variables
The icons themselves are based on separate templates within the
style sheet. The
selection of the icon to be used is achieved via the Xform
interface and a binding to the
template reference. This allows the user to select any template
from the list.
To improve usability the style sheet interface needed to be
dynamic from the point of
view that the number of available user specified variables for
each template would differ
and as a consequence would have different meanings in the
context of the current
template. A separate style sheet configuration file is provided
to the Xform and bound to
the style sheet the result is that the Xform now knows what
controls to display when.
Looking at figures 16 and 17 it can be seen that the number of
inputs available and the
labelling of these inputs differ between icons.
The positioning of the icons on the screen was another problem,
which took considerable
time to resolve. The dynamic nature of the icons and scalability
meant hard coding the
positions on the page was not an option, as such each position
needed to be calculated
based on a stating point the size of the icons and the screen
width.
-
44
5.3. Final System Architecture
Figure 18 Final system architecture showing changes
5.4. Interfaces
The final interfaces were partly governed by the functional
requirements but also
constrained by the capabilities of Xforms. As mentioned earlier
the Administration Xform
had a few additions to support the specification of basic type
information for the
retrieved content. Xforms can be styled using CSS in much the
same was as HTML the
process is not quite as straightforward and again relies on the
browsers support. As such
only basic styling was used mainly for positioning elements on
the user interface.
Another of xforms advantages is the ability to dynamically
display content and controls
without making requests to a server. This function is used on
the admin interface to
provide a page style navigation through the various Dapps the
user has created; the left
-
45
and right arrow icons navigate between the Dapps updating the
relevant fields. This
ability is also demonstrated by add and remove controls, which
allow the user to add
new Dapps or variables and likewise remove them. The changes
made by the user still
need to be saved, if the dapp.xml file was stored on the local
file system this would be
easy through the use of Xforms built in put submission, since we
are running from a
server we need to submit the XML and use a servlet to save the
changes, although this
is not a perfect solution Xforms makes the submission
asynchronously so the user is not
affected too much.
Figure 19 Admin interface showing Dapp configuration data
There are some outstanding issues with the Xpath navigation, the
original intention
being that the Xform should be able to identify or generate the
required Xpath to a
specific element by inspecting the database output and the dapp
configuration file. The
Xform cannot handle groupings of data in this way and some
additional path information
needs to be added by the user. For example we ideally want the
user to be able to enter
any parameter that is available as either a search parameter or
a styling parameter. This
information is taken from the Dapp and output XML documents
stored on the server.
These elements only store the lowest level element names and as
such cannot be passed
as a useful xpath parameter since we require the full path, in
this case we need to first
access the parent element of Fundamentals first. This is perhaps
an oversight in the
design but some modifications can be made to rectify this by
adding more contextual
information to the dapp.xml file.
-
46
Figure 20 A slightly alternative presentation approach where the
width of the ring signifies a value
It can be seen in the above illustration that two sets of
submission controls are provided
to the user one for the database query and one for the data
styler. The svg result is
loaded automatically from the server into a separate iframe,
when changes are made to
the style sheet the xform waits until the submission is complete
then refreshes the frame
to update the graphic.
6. Testing
In order to test how effective the design is at fulfilling the
requirements the testing is
divided into 3 categories: Component testing and usability tests
to determine how well
functional requirements are met. To test the initial hypotheses
that a graphical interface
will be of advantage to an investor speed accuracy tests are
carried out.
6.1. Component Testing
On the software level unit tests were carried out on each
component to ensure they
achieve the desired functionality. Each functional requirement
is tested in turn to ensure
the final design satisfies the original specification.
-
47
6.2. Usability testing
After sufficient testing of the base components was completed
the User interface had to
be tested to determine how effective the design is in terms of
usability and also to
determine if the solution provides proof of concept.
To test the usability of the system an observational approach
was taken based on
Nielson’s 5 quality attributes28:
Learn-ability: How easy is it for users to accomplish basic
tasks the first time they
encounter the design.
Test subjects were not given any background on the program and
asked to try and
interact with it. They were also asked to describe what they
were thinking and any
assumptions they had about the interface. The observer did not
respond to any direct
questions at this point in order to gauge how effective the
interface was at
communicating functionality.
Efficiency: Once users have learned the design, how quickly can
they perform
tasks.
After the initial tests users were given the opportunity to ask
questions to get a better
understanding of the interface, they were then asked to repeat
specific tasks in order to
assess how easy it was to perform specific functions.
Memorability: When users return to the design after a period of
not using it, how
easily can they re-establish proficiency?
Test subjects were at this point asked to return to the program
after a period of time in
order to assess how easy it was to remember the affordances of
the interface.
Errors: How many errors do users make, how severe are these
errors, and how
easily can they recover from the errors.
An observational approach was again taken to note any mistakes
the user made and
their impact on the system.
Satisfaction: How pleasant is it to use the design.
Finally test subjects were asked on a scale of 1 to 10 how
pleasant they felt the interface
was to use.
-
48
6.3. Speed & Accuracy Test
To test how well the system answers the initial problem an
assessment is made of how
well users can gain insight into the data being represented by
the system. To test this
two factors were investigated; speed and accuracy.
Experimental Set-up
A set of 100 shares representing the FTSE 100 was selected, the
data set being a
representation of the market on a specific date. For each date
test subjects were asked
to identify a value in the set firstly on the graphical
interface then on a plain text
representation of the same data.
The ordering of the symbols was changed in between tests to
ensure subjects didn’t
memorize the positioning of a particular stock. Subjects were
timed to see how long it
takes to identify a particular value and then assessed on how
accurate they were.
For the first two tests the relative size graphical icon was
used. This representation
changes the size of the icon relative to a specified value.
Figure 21 Relative Size box graphic
28
http://www.useit.com/
-
49
In the first test users were asked to identify which icon they
thought represented the
highest and lowest value in a collection.
For the second set of data test subjects were asked to identify
trends based on the daily
price movement. For this test the Two Variable Box
representation was used
Figure 22 Two variable box graphic showing the relative
difference
Correctly identify steepest trend up and down in a
collection
Figure 22 shows the basic two variable icon, the rate slope of
the line indicates the
difference between the specified variables.
To test the effectiveness of this design users were asked to
look at the graphic and
identify which stock they thought was falling the fastest and
also which one they thought
was rising the fastest.
The decision times were timed in all cases as a comparison to
timings gained using a
text only representation.
-
50
Figure 23 Two variables relative to a third.
Correctly identify value the indicator near its highest and
lowest extreme.
Figure 23 shows one of the more complex icon designs, similar in
concept to the
candlestick the diagram it aims to show the direction and rate
of change of the daily
price in relation to its year to date high.
As with the previous icon the slope of the line indicates the
rate of change and the colour
re-enforces its direction. The position of the line in relation
to its container box signifies
how close the current price is to the highest value it has been
over the past year.
To test the effectiveness at communicating this information
users were again timed and
asked to identify the stock they think is closest to its year to
date high and furthest
away.
Figure 24 shows the text only interface, which was implemented
as a style sheet
template in order to keep the surrounding interface the same and
change as few test
variables as possible. The above tasks were all repeated on this
interface again changing
the sorting order of the data to avoid test subjects memorizing
data locations.
-
51
Figure 24 Text only representation of a variable
6.4. Results
Usability Test.
Learn-ability: after observing a set of 5 subjects it became
clear that more
contextual information was required for the controls. One test
subject commented
that it is not immediately obvious what function some of the
controls performed.
Another issue was the openness of some of the controls, for
example the zoom
control can be set to any value the user wants and it is not
immediately obvious
how large that will make the icons.
Efficiency: On an initial attempt with no instruction some users
had difficulty
working out what the controls did, however after a quick
demonstration most
could manipulate the data confidently.
Memorability: After a day the users were asked to return to the
interface and try
out some basic tasks to see how easy it was to repeat. Most
users achieved this
-
52
task successfully and the main issue seemed to be the initial
usage of the
interface.
Errors: The most common errors the users made was to either
compare
parameters which were not suitable for any logical comparison
and selecting
scales that were caused excessive distortion of the graphics.
The first issue is
hard to rectify since the user can define any data source an
assumption is made
that they will pick resources suitable for comparison. The
second issue can be
rectified by the addition of stricter limits to the
interface.
Satisfaction: The overall satisfaction rating was 6 out of 10
from our 5 test
subjects. There is evidently room for improvement on the
interface however some
of the test subjects had no prior knowledge of stock market
trading and as such
the overall purpose and context of the application was new to
them.
Speed-Accuracy Test.
The results of the speed and accuracy test were more promising
with 80% of the test
cases the users speed of decision-making was faster than using a
test only interface.
The accuracy figures were however less conclusive, with both
textural and graphical
accuracy rates of 60%. We would expect the accuracy rates to be
around or lower for
the graphical representations because they are not as definite
as numerical figures.
The test group used could have been larger and more testing in
this area is needed to
make definite conclusions on the effectiveness of the interface,
however the initial
results tend to agree the hypothesis that a graphical system is
better in terms of gaining
quick insights into large sets of data.
7. Conclusion The application displays a basic answer to the
initial requirements albeit a simplified one
but could easily be extended to give a wider range of functions,
in its current state it
demonstrates that by using the available web standards a
flexible system can be
developed which allows data to be retrieved transformed and
represented on the web.
Our test results prove that the system can effectively impart
large amounts of
information quickly to the viewer; however further work is
required to improve the user
interface, mainly in the areas of contextual information.
To expand the system a user can easily add any content they like
provided Dapper.net
could extract it successfully. There are limitations to the data
that can be viewed and the
graphical icons that are displayed. Going forward it would be
beneficial to provide
-
53
another interface, which allows users to create icons based on
the retrieved data to
create personalized graphical representations.
8. Bibliography Cleveland, Williams S: Visualizing data, Murray
Hill, N.J. : At&T Bell Laboratories ; [Summit, N.J. :
Published by Hobart Press, c1993]
Ellinger, A. G: The art of investment, 3rd rev. ed. Bowes and
bowes, 1971
Harris, Robert L: Information graphics: a comprehensive
illustrated reference
New York : Oxford University Pr