Top Banner
Web Archiving A Brief Introduction Sawood Alam Department of Computer Science Old Dominion University Norfolk, Virginia - 23529 (USA)
28

Web Archiving: A Brief Introduction

Apr 14, 2017

Download

Internet

Sawood Alam
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Web Archiving: A Brief Introduction

Web ArchivingA Brief Introduction

Sawood AlamDepartment of Computer ScienceOld Dominion UniversityNorfolk, Virginia - 23529 (USA)

Page 2: Web Archiving: A Brief Introduction

About Me

Sawood Alam

Lexical SignatureWeb, Digital Library, Web Archiving, Ruby on Rails, PHP,

XHTML, CSS, JavaScript, ExtJS, Urdu, RTL and Linux.

● BTech, Jamia Millia Islamia, India, 2008● MSc, Old Dominion University, USA, 2013● PhD, Old Dominion University, USA, Current

Page 3: Web Archiving: A Brief Introduction

She Calls Me Dad!

Page 4: Web Archiving: A Brief Introduction

Agenda● Archiving and Web archiving● Purpose and importance● Scope of the web archiving● Issues and challenges● Tools and techniques● Memento: Time Travel for the Web● Archive X-Ray● Research opportunities in Web archiving● Our WSDL Research Group

Page 5: Web Archiving: A Brief Introduction

What is an Archive?● Accumulation of historical records● Long term storage and preservation● Less frequently used● Physical or digital

Page 6: Web Archiving: A Brief Introduction

What is Web Archiving?● Periodic snapshots of web pages● Preserving important events on the Web● Making archived content accessible

Page 7: Web Archiving: A Brief Introduction

Why do We Care Archiving?

Web contents decay rapidly!

● To preserve the history● To tell a story● For evidence● For backup● For personal satisfaction

Page 8: Web Archiving: A Brief Introduction

Issues and Challenges● Crawling● Storage● Retrieval● Replay● Accessibility● Completeness● Accuracy● Credibility

Page 9: Web Archiving: A Brief Introduction

Web Archiving Efforts● Internet Archive● Archive-It● Wikipedia● UK Web Archive● Various national and non-profit archives● Film, music and other multimedia archives● Scholarly archives● Personal archiving

Page 11: Web Archiving: A Brief Introduction

Memento<http://example.com>; rel="original",

<http://web.archive.org/web/20020120142510/http://example.com/>;

rel="memento";

datetime="Sun, 20 Jan 2002 14:25:10 GMT",

<http://web.archive.org/web/20020328012821/http://www.example.com/>;

rel="memento";

datetime="Thu, 28 Mar 2002 01:28:21 GMT",

<http://webarchive.loc.gov/all/20020803080544/http://www.example.com/>;

rel="memento";

datetime="Sat, 03 Aug 2002 08:05:44 GMT",

<http://wayback.archive-it.org/all/20091213015014/http://www.example.com/>;

rel="memento";

datetime="Sun, 13 Dec 2009 01:50:14 GMT",

Page 12: Web Archiving: A Brief Introduction

Archive X-Ray!● How much of the Web is archived?● Profiling various archive services● Predicting what they contain● Routing Memento aggregator queries

Page 13: Web Archiving: A Brief Introduction

Memento Aggregator

Page 14: Web Archiving: A Brief Introduction

Memento Aggregator

Page 15: Web Archiving: A Brief Introduction

Memento Aggregator

Page 16: Web Archiving: A Brief Introduction

Memento Aggregator

Page 17: Web Archiving: A Brief Introduction

Memento Aggregator

Page 18: Web Archiving: A Brief Introduction

Memento Aggregator

Page 19: Web Archiving: A Brief Introduction

Long Tail of Archives

Page 20: Web Archiving: A Brief Introduction

Archive Profile● High-level summary of an archive● Predicts presence of mementos● Provides statistics about the holdings● Small in size and publicly available● Easy to update and partially patch● Useful for Memento query routing and

other things

com,cnn)/ {“frequency”: 40, “spread”: 2}

uk,co,bbc)/ {“frequency”: 20, “spread”: 1}

com,usatoday)/ {“frequency”: 5, “spread”: 1}

Page 21: Web Archiving: A Brief Introduction

Research Opportunities● Information retrieval● Information visualization● Client and server side archiving● Archiving dynamic content● Distributed archiving● Discovering alternate long term archiving

techniques● Predicting “Important” events on the Web

and archiving them timely

Page 23: Web Archiving: A Brief Introduction

WSDL Research Group

Page 24: Web Archiving: A Brief Introduction

WSDL Research Group

Page 25: Web Archiving: A Brief Introduction

WSDL Research Group

Page 26: Web Archiving: A Brief Introduction

WSDL Research Group

Page 27: Web Archiving: A Brief Introduction

WSDL Research Group