Automated Benchmarking Of UK Museum Web Sites With An Introduction to UKOLN and UK Web Focus Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY UKOLN is supported by: Email [email protected] URL http://www.ukoln.ac.uk/
Jan 15, 2016
Automated Benchmarking Of UK Museum Web Sites
With An Introduction to UKOLN and UK Web Focus
Brian Kelly
UK Web FocusUKOLN
University of Bath
Bath, BA2 7AY
UKOLN is supported by:
[email protected]://www.ukoln.ac.uk/
2
Contents
• About UKOLN• UKOLN’s WebWatch Work For UK HEIs• Benchmarking UK Museum Web Sites• Comparison With “6 Of The Best”• Limitations Of Approach• Where To From Here?
3
UKOLN
UKOLN:• National focus of expertise in digital information
management• Based at University of Bath• Funded by JISC (HE and FE sector) and Resource:
The Council for Museums, Archives and Libraries, together with project funding (e.g. EU and JISC)
• About 25 FTEs • Carries out applied research (e.g. in metadata),
software development and provides policy and advisory services
4
UKOLN’s Dissemination Work
UKOLN carries out dissemination activities including work carried out by UKOLN’s Policy and Advice Team:
Interoperability FocusClose links with Resource and Museums community (member of CIMI Executive Committee)Involved in e-GIF standards workSee <http://www.ukoln.ac.uk/interop-focus/>
Collection Description FocusFunded by JISC, RSLP and British Library Coordination work on collection description methods, schemas & tools with goal of ensuring consistency across projects, disciplines, institutions and sectors See <http://www.ukoln.ac.uk/cd-focus/>
Bibliographic ManagementUK Web Focus - myself
5
UK Web Focus
UK Web Focus:• Funded by JISC to provide advice on Web
developments• Organises events (e.g. annual Institutional Web
Management Workshop), writes articles (e.g. regular columns in Ariadne e-journal), gives talks, etc.
• A member of UKOLN’s Policy and Advice Team (which also includes Interoperability Focus, Collection Description Focus and Public Library Networking Focus)
• Managed the original WebWatch project and continues to publish results of WebWatch surveys
6
Community Building
An important part of my work is community building within UK HE / FE Web management communities:
• An annual 3 day workshop which provides an opportunity for Web managers to: update their technical skills and approaches to managerial
and strategic thinking discuss and share problems and solutions with peers
• Active participation in (e.g.) JISCMail mailing lists e.g.: web-support: “My home page doesn’t look right in
Netscape 4. Can anyone help?” website-info-mgt: “A Web site has stolen text and images
from my Web site. What should I do?”“How should I impose a consistent look-and-feel across all departmental Web sites?”
• Comparing approaches across community and sharing best practices
7
WebWatch Project
WebWatch project:• Initially funded for 1 year in 1997 by BLRIC to
develop and use automated robot software to analyse Web developments across various UK communities
• Once funding finished the work continued, but made use of (mainly) freely available Web services to analyse various features of Web site communities
• Supports community-building work across UK HE/FE Web managers (sharing, not flaming)
• See <http://www.ukoln.ac.uk/web-focus/webwatch/>
8
WebWatch SurveysSearch Engines Used To Index UK HE Web Sites:
ht://Dig most popular and growing in popularity followed by an MS solution
Interest in licensed Ultraseek/Inktomi solution Interest in externally hosted indexers (e.g. Google) Surprising number of institutions with no search facility See <http://www.ukoln.ac.uk/web-focus/
surveys/uk-he-search-engines/>
Nos. of Links Cambridge has most (231,000 links to all servers) Sheffield has the most to a single server (46,000) See <http://www.ariadne.ac.uk/issue23/web-watch/>
Nos. Of Web Servers Cambridge has most (200+) See <http://www.ariadne.ac.uk/issue25/web-watch/>
9
Update On Search EnginesSept 1999 ht://Dig: 25 Excite: 19 Microsoft: 12 Harvest: 8 Ultraseek: 7 SWISH: 5 Other: 23 None: 59
Today: ht://Dig: 48 Microsoft: 17 Ultraseek/Inktomi: 12 Google: 11 Excite: 5 Webinator: 5 Others: 22 None: 29
The growth in popularity of ht://Dig, the unexpected appearance of the Google externally-hosted service and the move from SWISH and Harvest would not have been noticed without the snapshots. The discussion of surveys informed decision-making.
The growth in popularity of ht://Dig, the unexpected appearance of the Google externally-hosted service and the move from SWISH and Harvest would not have been noticed without the snapshots. The discussion of surveys informed decision-making.
NOTE
10
WebWatch Activities
As well as these metrics a number of observations of features have been carried out
404 Error Page The appearance of and functionality provided by the
institution’s 404 error page
Appearance of Main Entry Point The appearance of the institution’s entry point, and
identifying main types (menu-style vs news) and use of technologies (Java, DHTML, etc.)
A “rolling demo” has been provided of these features allowing interested parties to quickly get a feel of the approaches taken within the community
These have proved very popular – see <http://www.ukoln.ac.uk/web-focus/site-rolling-demos/>
11
Benchmarking
WebWatch approach of monitoring UK HE Web sites can be extended into a benchmarking exercise:
• Making comparisons with peers• Checking compliance with standards • Checking compliance with community or funders guidelines
(e.g. e-GIF guidelines)
This has advantages for organisations: Observing best practices and learning from them Ditto for bad practices Community building
and some potential disadvantages: Establishment of leagues tables Inappropriate comparisons Penalty clauses for failure to comply with standards
This has advantages for organisations: Observing best practices and learning from them Ditto for bad practices Community building
and some potential disadvantages: Establishment of leagues tables Inappropriate comparisons Penalty clauses for failure to comply with standards
12
Benchmarking Museum Web Sites
WebWatch approach to benchmarking has been applied to a small number of UK Museum Web sites:Small selection chosen in order to:
Keep resource requires to a minimum Validate methodology Gauge interest in this approach
Selected resources were: Sample of museum Web sites Guardian’s six best museum Web sites
If methodology is felt to be valid and there is sufficient interest the approach could be taken more widely across the museum community
Details of survey available from <http://www.ukoln.ac.uk/web-focus/events/conferences/museums-2001/>
Details of survey available from <http://www.ukoln.ac.uk/web-focus/events/conferences/museums-2001/>
13
Benchmarking Activity
Choosing the sample:• mda list of UK Museum Web sites used as master
source <http://www.mda.org.uk/vlmp/>• Web sites beginning with letter “A” were chosen
<http://www.mda.org.uk/vlmp/#A>• Andrew Carnegie Birthplace Museum removed from
sample as Web site was unavailableAbbot Hall Art Gallery
Aberdeen Art Gallery & Museums
AccessArt
Aerospace Museum
Allhallows Museum
Althorp House
Amberley Museum
American Museum in Britain
Armagh Planetarium
Arnolfini Gallery
Ashmolean Museum of Art & Archaeology
Astley Hall Museum and Art Gallery
Avoncroft Museum of Historic Buildings
The 13 Selected Museum Web Sites
14
Approaches
Approaches taken:• Use of freely-available Web
sites which provide analysis capabilities
• Page of “live links” provided enabling all users to reproduce findings
• Complement this with manual inspection
Benefits of this approach:• Openness, reproducibility
and objectivity of survey
http://www.netmechanic.com/toolbox/html-code.htm
http://www.netmechanic.com/toolbox/html-code.htm
15
Domain Names
Findings• 11 museums (92%) have an entry point which is the
domain name and 2 (8%) have an entry point which is one level beneath the domain name
• 6 (46%) have a .co.uk domain; 3 (23%) have .org.uk; 2 (15%) have .com; 1 (8%) has .org; 1 (8%) has .ac.uk
Discussion• Most of the museums have a short, memorable URL• The variety of top level domains may be confusing
for end users• How will the new .museum domain be deployed?
Is there an opportunity for a major advertising campaign?
Reminder – findings are for a small, non-random sample
Reminder – findings are for a small, non-random sample
16
Server Software
Netcraft used to analyse Web server software
Findings• 7 hosted on a Unix platform (4 on Linux, 2 on Solaris and 1 on
BSD)• 6 hosted on a Microsoft platform (4 on NT 4 or Windows 98, 2
on Windows 2000)
Issues• Security, scalability, ease-of-use, ….
http://www.netcraft.com/http://www.netcraft.com/
17
Standards Compliance
Entry point examined for compliance with HTML and CSS standards using the NetMechanic and W3C Validator Web-based tools:
Findings• 0 pages were HTML compliant (according to W3C)• Of the 5 sites which contained a CSS style sheet, 0
had errors (according to W3C)• 3 pages were HTML compliant (according to
NetMechanic)
Issues• HTML-compliance is important for ensuring wide
accessibility and for repurposing content
18
AccessibilityEntry point examined for compliance with W3C WAI guidelines for accessibility using the Bobby Web-based tool:
Findings• Only 2 pages had no WAI Priority 1 error
Issues• Compliance with accessibility standards is
important for ensuring access to resources for people with disabilities
• Compliance with accessibility standards may be an organisational requirement
• Compliance with accessibility standards may be a legal requirement
19
Size Of Entry Point Using Bobby
Findings (Bobby)• Largest entry point initially appeared to be 159 Kb • On further analysis of framed sites the largest entry
point was found to be 236.91 Kb• The smallest appeared to be 1 Kb – but this was a
FRAMES page (and not the individual linked pages)• On further analysis of framed sites the smallest entry
point was found to be 15.45 Kb
Issues• Bobby flagged pages which used frames but further
manual analysis and calculations were needed
20
Size Of Entry Point Using NetMechanic
Findings (NetMechanic) • Largest entry point initially appeared to be 237,107 b
(231 Kb) • The smallest appeared to be 16,045 b (15.7 Kb)
Issues• NetMechanic flagged pages which used frames but
further manual analysis and calculations were needed
Bobby and NetMechanic identified the same largest and smallest sites – but this is not always the case
Bobby and NetMechanic identified the same largest and smallest sites – but this is not always the case
21
Comments On Size Measurements
Use of tools to analyse size of Web pages has indicated several issues:
• Need for manual inspection of results (normally outliers) in order to spot invalid comparisons
• Different ways of treating: Redirects Frames User-agent negotiation etc.
and inconsistencies in handling: robot exclusion protocol external files (e.g. CSS and JavaScript), etc.
may result in inconsistent findings• Changes in content of page (e.g. inclusion of news items,
personalised interfaces, etc.)• Output generated for viewing on Web, not further processing• Current need to manual sum sub-parts
22
Link PopularityThe numbers of links to the Web site was found using LinkPopularity (which has an interface to AltaVista):
Findings• The most linked-to Web site had 2,731 links• The least linked-to Web site had 45 links
Issues• Links can drive traffic to your Web site • Links can be used by citation-based search engines
(such as Google) to boost the ranking of your site (many links to your page means Google will give it a higher ranking than a similar page with fewer links)
• Snapshots of link popularity can help gauge effectiveness of publicity campaigns
23
Search Engine Coverage / Size Of Web SiteAltaVista and Netscape’s What’s Related tool were used to measure the size of the museum Web sites (i.e. the numbers of pages they had indexed):
Findings• Most no. of pages indexed by AV was 2,037 pages• Most no. of pages indexed by NS was 1,919 pages• Least no. of pages indexed by AV was 0 pages• Most no. of pages indexed by NS was 0 pages
Issues• The nos. of pages indexed should be
≥ 0 and ≤ nos. of pages on Web site• If significantly fewer pages are indexed than exist,
this may show a Web site which is not search-friendly (e.g. use of frames, splash screens, etc.)
24
Search FacilityInformation on museum’s search engine was found:
Findings• 10 sites have no search facility• 3 have a search facility:
1 uses the FreeFind externally-hosted search engine 1 uses a Microsoft search engine 1 uses a Perl script (to search an online catalogue)
• 1 search facility not working (over 1 month period)
Issues• Users expect to be provided with search facilities • It can take < 30 minutes (and little technical
expertise) to make an externally hosted search engine available, suitable for simple static Web sites (but not many people know this)
25
404 Error Page
Information on the 404 error page was found:
Findings• 10 sites use the default 404 error message• 3 have a lightly branded error message, but with little
additional functionality
Issues• The 404 error page is (sadly) likely to be widely accessed• It is desirable that it:
Reflects the Web sites look-and-feel Provides functionality to assist a user who is ‘lost’:
Provides access to a search facility / site mapProvides contact details
• The 404 page can also be context-sensitive (e.g. different pages for users following a local link / remote link / no link)
26
27
Robots.txt
Information on the Web site’s robots.txt file was found:
Findings• 12 sites have no robots.txt file • 1 site has a simple robots.txt file
Issues•robots.txt file can be used to control indexing of
your Web site e.g. stop robots from indexing: Pre-release versions of pages Test areas …
28
Other Surveys
Additional surveys were carried out:
Cachability Of Entry Point• Cacheability Engine used
<http://www.mnot.net/cacheability/>• 11 entry points were cachable and 2 were not
What’s Related To Web Site• Netscape’s What's Related? facility
<http://home.netscape.com/escapes/related/> used to record:
Popularity, nos. of pages and nos. of links Relationships with other sites
29
Six of the Best: Museums Guardian’s Online supplement (18 Oct 2001) published their list of the six best Museum Web sites:
• The Hermitage in St Petersberg at<http:// www.hermitagemuseum.org/>
• Metropolitan Museum at <http:// www.metmuseum.org/>
• SCRAN at <http:// www.scan.ac.uk>• Tate Modern at <http://www.tate.org.uk/modern/>• The Louvre at <http://www.louvre.fr/>• Design Museum at
<http://www.designmuseum.org/>
30
Comparisons
Automated Surveys • 3 had a search facility• Nos. of links to sites ranged from 723 to 18,366• All surveyed entry points had P1 accessibility errors• All surveyed entry points had HTML errors
Observations• 3 were providing a search facility• Most were providing a simple robots.txt file• Some of the 404 error messages were slightly
better
31
Accessible to Browsers
How do the Web sites look in different browsers?
The Lynx text browser and an emulation of the Mosaic browser were used in order to investigate how the Web sites would look to:
• Users of old browsers• Users of browsers with no JavaScript support• Users of text browsers (or an indexing robot)
32
Mosaic
33
Lynx
34
Limitations Of Survey
Limitations of this type of benchmarking approach include:
• Lack of standards• Limitations of the tools• Resources needed to carry out surveys• Scoping of Museum sites and invalid comparisons• Automated approach fails to address content
issues which require a manual approach
35
Limitations - Standards
There is a lack of standards to support benchmarking work (or conflicting standards). For example:
Size of a page
How do you measure the size of the museum’s entry point? You need this in order to make comparisons and if, say, you have guidelines on the maximum file size.
Problems What do you measure (HTML file, inline images, external
CSS and JavaScript files, …)? Changes in file content (e.g. user-agent negotiation, news
content, frames and refresh elements, etc.) How do you handle the robot exclusion protocol (REP)
NOTE: Bobby and NetMechanic work differently: the former only measure HTML and images, the latter obeys the REP
NOTE: Bobby and NetMechanic work differently: the former only measure HTML and images, the latter obeys the REP
36
Limitations - Tools
Issues:• Auditing tools tend to make implicit definitions (e.g. measuring
size of a page). Different results may be obtained when using different tools for same purpose (or if vendor changes its definition)
• Use of Web-based auditing services:Talk has described use of (mainly free) Web-based servicesThe providers may change their policy Use of the URL interface to pass parameters (rather than direct use of the form on the Web page) may not be allowed
• Use of desktop auditing toolsUse of desktop tools avoids the problems of change control of Web based services.However it means that it may be difficult for others to reproduce findings
37
Limitations - Resources
It can be time-consuming to:• Maintain URL of entry point to museum Web sites
(need to have close links with provider of central portal)
• Manage the input to the variety of Web-based services
• Process the output from the Web-based services (current need to initiate inquiry, wait for results and manually copy and paste results)
38
Limitations – Scope of Web Site
Scope• What is a museum Web site?• What is not part of a museum Web site?• It can be difficult to answer these questions.• There are no standard ways to define a “Web site”
other than by use of domain names and directory structures
• Even directory structures can be inadequate if they are not used correctly
Comparisons• It may not to sensible to make comparisons
between museums of different types and sizes
39
Limitations – Automated Only
Use of an automated approach:• Would not (easily) address content issues• Has been supplemented with manual observations
(e.g. home page, 404 page & search engine page)
However:• An automated approach can be more objective and
reproducible• An automated approach should be less resource-
intensive (once software has been set up to maintain links to resources, surveys sites and process results)
• A automated approach could be used in conjunction with a manual survey (of a representative sample set of resources)
40
Beyond A Pilot
Despite the limitations which have been described, would a comprehensive and systematic benchmark of UK Museum Web sites be of benefit?
• Can we address the resource issues?• Are the lack of standards being addressed?• Can we find someone to do the work?• Should the focus be developmental?• Can the work be extended to provide notification of
problems (e.g. search engine not working)?
What may happen if we don’t do this?
Might we find that funders set up inappropriate or flawed performance indicators?
What may happen if we don’t do this?
Might we find that funders set up inappropriate or flawed performance indicators?
41
A Model For Implementation
The benchmarking process can be made less time-consuming if a more flexible model for managing the data was usedThe benchmarking process can be made less time-consuming if a more flexible model for managing the data was used
At present we seem to have a HTML page with links to museum Web sites
Unfortunately HTML pages are difficult to repurpose
Page for viewing
Page for inputto Web services
A better model is to store links in a neutral databases, and to generate pages for viewing by end users and for input into benchmarking Web services
The database could also be reused for other purposes e.g. checking links and email notifications of problems
42
Towards “Web Services”
Background• Web initially implemented for provision of information• CGI allowed users to input data and provided
integration with backend applications • Techniques described use URL as input to auditing
service. However this provides limited functionality and is susceptible to vagaries of marketplace
Future• “Web Services” will support machine integration by
providing a standard messaging infrastructure which uses HTTP protocol
• XML output (e.g. EARL) will provide a neutral format for benchmarking output, and can describe benchmarking environment (EARL is RDF)
43
Need For Standard Definitions
Need For Standard Definitions• There is a need for standard definitions of
terminology such as Web page, visit, unique visit, session, etc. in order to ensure that meaningful and objective comparisons can be made
• The market place is addressing current deficiencies within Web Advertising and Web Auditing communities (and there are financial incentives for this to be solved)
• With the growth in e-governments internationally and governments setting targets (X% of government work to be carried about electronically by 2005)
44
Doing The Work
If there is further interest, who should do the work?
Who
Funding body
Auditing bodyOther central body
Volunteer
Part of current remit
What
Why
Other(s) New remit
Research interest
Dissemination
Provides benefitsto communityMaintain central database
Software development
Student project
Producing reports
BenchmarkingWork
Researcher
45
What Next?
To summarise:• Approach to the automated benchmarking of a small
set of museum Web sites has been shown• Implications of the findings have been discussed• There are limitations of the methodology
It is suggested that:• Despite the limitations benchmarking of museum
Web sites can be beneficial: Community building Learning from successes and mistakes
• There may be advantages in carrying out this work within the community
46
Questions
Any questions?
Questions For You• Would further work be useful?• Who would do the work?• Is there a need for a portal for use by the
community of museum Web managers as well as for end users?
• Anyone interested in joint work in this area (possibilities of a paper for a conference - e.g. Museums and the Web 2002 conf. - proposals needed by 30 Nov)
Questions For You• Would further work be useful?• Who would do the work?• Is there a need for a portal for use by the
community of museum Web managers as well as for end users?
• Anyone interested in joint work in this area (possibilities of a paper for a conference - e.g. Museums and the Web 2002 conf. - proposals needed by 30 Nov)