1 Technical Issues Concerning The Use Of Personal Data On The Internet Brian Kelly Email UK Web Focus [email protected]UKOLN URL University of Bath http://www.ukoln.ac.uk/ Bath, BA2 7AY UKOLN is funded by the British Library Research and Innovation Centre, the Joint Information Systems Committee of the Higher Education Funding Councils, as well as by project funding from the JISC’s Electronic Libraries Programme and the European Union. UKOLN also receives support from the University of Bath where it is based.
36
Embed
1 Technical Issues Concerning The Use Of Personal Data On The Internet Brian Kelly Email UK Web Focus [email protected] UKOLN URL University of Bath.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Technical Issues Concerning The Use Of Personal Data
UKOLN is funded by the British Library Research and Innovation Centre, the Joint Information Systems Committee of the Higher Education Funding Councils, as well as by project funding from the JISC’s Electronic Libraries Programme and the European Union. UKOLN also receives support from the University of Bath where it is based.
UKOLN is funded by the British Library Research and Innovation Centre, the Joint Information Systems Committee of the Higher Education Funding Councils, as well as by project funding from the JISC’s Electronic Libraries Programme and the European Union. UKOLN also receives support from the University of Bath where it is based.
2
Contents
About UK Web Focus
Personal Information and the Internet• End User issues• Information Provider issues• System Administrator issues• Management Issues
Advertising revenue can make these a commercial proposition
10
Ahoy!
Ahoy! is a research project which uses AI techniques to find (a small number of) personal home pages
AI techniques will make it easier to find personal information
http://www.ahoy.cs.washington.edu:6060/
11
Web Browsers and PrivacyClient Caches
Web browsers store viewed resources in a local cache (on hard disk on network drive).
These resources can be re-used.
Potentially these files could be accessed by other users of PC or a system administrator
12
Web Browsers and Privacy
Cookies
Cookies enable information to be stored on your local PC which can be reused by the remote server.
Cookies are useful in applications, such as "shopping baskets", CBL, etc.However there are privacy implications, since cookies can be used to record paths through a website.
13
Information Providers
What personal information is provided on the web?
Corporate Information
Individual /Societies
14
Changing Context
Technologies such as Frames can change the context of resources on the web by:• Pointing to text• Pointing to graphics
There has reportedly been a "Babes on the Web" page. Document held remotely
15
Web Forms
Web forms are now trivial to set up Save time and effort Information may be
reused easily Are information
providers aware of implications of reusing information?
16
System Administrators
System Administrators can:• Read incoming and outgoing
messages and Usenet postings
• Analyse cache log files to find popular websites - and potentially who's been accessing them
• Deny access to specified websites
• Publish statistics on hits to pages
17
Web Statistics
Many web administrators publish their web statistics:
• Access by country• Access by domain
name• Most popular
pages
18
Restricting Access
It is possible to restrict access to sites containing dubious content
It is also possible to record email address and take action if persistent access attempted
Is this:• Sensible action • Breach of privacy?
19
Solutions
There are a variety of solutions to the issues concerned with Personal Data and the Internet:
• Don't use the Internet• Information providers' "tricks"• System administrators' "tricks"• Protocol Developments• Auditing
Education is important throughout
20
Solutions - Denying Access
• Information published on the web can be easily processed by robots
• Can prevent (well-behaved) robots from accessing resources using the Robot Exclusion Protocol (REP) (robots.txt file)
Alta Vista search for "Brian Kelly" gives 2,800 hitsBut:
• Not widely used: ~30% of UK universities• Not easily scaleable (single file at web root)
User-agent: *disallow: /stats/
21
Solutions - For Info Providers
• REP implemented by system administrator• Possible (but not easy?) to create master robot.txt file by merging departmental ones
• HTML 4.0 <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> element enables individual files to contain robot directives New and not yet widely supported
• Since robots tend not to follow CGI programs, could hide information behind a button Not elegant
There are technical ways of:• Preventing resources from being used in
frames • Preventing images from being "stolen"
Solutions are being considered mainly for copyright protection
However such solutions aren't widely deployed as:• They may prevent the resource from being
reused in valid ways• No user / political pressure?
23
Political Developments
Global Information Networks
• European Conference in Bonn in June 97• Raised issues of:
– Data protection– Technological solutions
• See <URL: http://www2.echo.lu/bonn/conference.html>
24
W3C Response
World Wide Web Consortium (W3C) responded to Bonn paper:
• Summarised technological solutions:– DSig: a web of trust– PICS: content selection without censorship– P3P: privacy project– IPR: intellectual property rights
• See <URL: http://www.w3.org/TR/NOTE-eu-conf-970711>
25
DSig
DSig:• W3C's Digital Signature Initiative• Helps users to decide who to trust• Based on digitally signed assertions:
"This web page comes from Bath University Courses office and gives a legally binding list of courses"
• See <URL: http://www.w3.org/Security/DSig/Activity.html>
26
PICS
PICS:• Platform for Internet Content Selection• Mechanism for rating web pages
e.g. X, A, PG, U
• Decision to accept resource made by end user (or end user organisation)
• Choice devolved - no censorship of originating resource
• See <URL: http://www.w3.org/PICS/>
27
IPR
W3C's IPR activity:• Intellectual Property Rights and the Web:
– Does use of a cache infringe copyright– Can links to resources be made freely– …
• Asks the contentious question:Does the nature of the technology require us to change the legal understanding or status of copyright as it stands now?
• See <URL: http://www.w3.org/IPR/Activity.html>
28
P3P
P3P:• Platform for Privacy Preferences
• Will develop specification and demonstration of way of expressing privacy practices and preferences by Web sites and users
• Architecture and grammar work complete (Oct 1997)
• See <URL: http://www.w3.org/Privacy/Activity.html>
29
P3P Deliverables
General Overview of the P3P Architecture• Document describes the P3P model
Grammatical Model
• Grammar and vocabulary for machine-readable statements:
Data Categories: e.g. name, email, ...
Practices: Use: e.g. system admin, research, customisation
Transfer: divulge information within organisation
Release: divulge info to other organisation
Access: ability of data subject to view information
See <URL: http://www.w3.org/TR/NOTE-IPWG-Practices.html>
30
JTAP Calls
Digital SignaturesStudies to identify appropriate protocols and to test deployment. Seeking to fund an overview report and a technology deployment pilot
Certificate Based Infrastructure ServicesTechnical overview and pilot. Seeking to fund an overview and technology watch project at a cost of £25,000, followed by one or two deployment pilots
Work to start in Dec 1998
See <URL: http://www.jtap.ac.uk/bid/c14_98.html>
31
Privacy Services
TRUSTe:• An "independent, non-profit, privacy initiative
dedicated to building users' trust .. on the Internet"
• TRUSTe sites agree to:– Maintain an approved Privacy Statement
– Explain information gathering practices:
– What personal information will be used for
– Whether information will be disclosed
– Display the TRUSTe's Mark
• TRUSTe will periodically check conformance
• See <URL: http://www.etrust.org/>
32
What's Happening in UK?
Number of universities have provided guidelines governing Internet use:
• Data Protection• Computer Misuse• ..
But:• Is work being duplicated?• Is it still relevant?
http://www.cam.ac.uk/CS/DPA.html
33
What's Needed? Auditing Software WebWatch
• Project based at UKOLN• Monitors web technologies (not content)• Potential for auditing robots.txt files?
Do we want software for auditing at a national or institutional level?
Can we follow the TRUSTe model?
34
What's Needed? Catalogue of GuidelinesA catalogue of UK HE web resources is being produced:
• Uses ROADS (cf. SOSIG, OMNI, etc.)
• Various categories planned:
– AUP– Guidelines for authors– Local search engines
• Feedback welcome
35
What's Needed?EducationNeed for education for:
• End users• Information providers• System administrators• Managers
Who provides training materials?
Who delivers the training?
36
Conclusions
• Widespread use of the Internet / ease of publishing has increased privacy concerns
• Need for education and awareness:– End users– Information providers– System administrators (central & departmental)
• Do we want a system like TRUSTe? • Need for auditing tools locally / nationally?• Need to share experiences• Need to be aware of (implement?)