Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11

Session 11: Episode 3(3) —

Birth & explosion of the World Wide Web

William P. Hall President Kororoit Institute Proponents and Supporters Assoc., Inc. - http://kororoit.org [email protected] http://www.orgs-evolution-knowledge.net

Access my research papers from Google Citations

http://kororoit.org/

mailto:[email protected]



http://www.orgs-evolution-knowledge.net/





http://scholar.google.com.au/citations?user=yOXsKpcAAAAJ&hl=en&pagesize=100&sortby=pubdate

Tonight

From the point of view of information science, the last session considered how the growth of knowledge overwhelmed paper-based libraries and how computers changed scholars’ personal access to published knowledge

Tonight we begin to explore how the Internet and World Wide Web grew from research into broad-scale communications networks into the technology that now gives billions of people nearly universal access to the bulk of externally preserved knowledge in the world.

2

Episode 3(3) – Birth & explosion of the World Wide Web The World Wide Web

Web Origins and History Vannevar Bush’s Memex Tim Berners–Lee Invents the World Wide Web Basic Web Tools The Web Explodes How Much Knowledge Does the Internet Access?

Was the communications infrastructure of the Internet invented to retain command &

control after a nuclear war?

— Hardware, standards, applications

(glossed over in the book)

Some think DARPA invented the internet to help command and control survive a nuclear first strike

ARPA/DARPA (Defense Advanced Research Projects Agency) – Established 1958 to formulate and execute research and

development projects to expand the frontiers of technology and science.

Packet-switching vs direct point-to-point networking – Data streams cut into standard sized blocks wrapped in header information

used by interface message processors (routers) to direct the contents to a particular destination

– One sending device can direct packets to many different destinations & vice versa

– Video: Computing Conversations: Vint Cerf on the History of Packets

– 1968-9 research project to develop packet-switching interfaces between different ARPA labs so computer resources could be shared

– Packet switching offered a solution for slow & unreliable connections Needed to cope with multiple paths

Packets arriving out of order

Lost packets

Duplicated packets (i.e., same packet received via different routes) 4

https://www.youtube.com/watch?v=jZJzNWOY0QI




Growth in technology and interconnections

First ARPANET message sent 1969, reached East Coast 1970 1972-1982 Gov’t funded research & infrastructure

– Backbone interconnecting universities & research labs – Standards for exchanging text & digital data

1971 FTP (File Transfer Protocol) with many improvements over time 1973 Email (based on store & forward technologies)

1981 National Science Foundation (NSF) funded the Computer Science Network (CSNET). Connected additional CS depts.

1982 Internet Protocol Suite standard (TCP/IP) – End-to-end connectivity specifying how data should be packetized,

addressed, transmitted, routed and received at the destination. – Transmission Control Protocol (TCP) controls assembly & disassembly

of packets for network transmission – Internet Protocol (IP) controls addressing

Video: Vint Cerf TCP/IP 40th Anniversary Event (11:48)

– Made it possible to inter-connect networks = “Internet” Video: How did 'internetworking' become THE INTERNET? (with Vint

Cerf) 5

https://en.wikipedia.org/wiki/ARPANET

https://en.wikipedia.org/wiki/CSNET

https://en.wikipedia.org/wiki/CSNET

https://en.wikipedia.org/wiki/Internet_protocol_suite

https://en.wikipedia.org/wiki/Internet_protocol_suite

https://en.wikipedia.org/wiki/Routing

http://web.archive.org/web/20080730025931/http:/searchnetworking.techtarget.com/dictionary/definition/what-is-TCP.html

http://web.archive.org/web/20130501122536/http:/foldoc.org/internet+protocol

https://www.youtube.com/watch?v=lLiQnw0b-YQ



https://www.youtube.com/watch?v=hr6VpPJywYw




Exponential growth of host numbers largely driven by data & knowledge sharing (email & file sharing)

Hypertext adds cognitive links / relationships to the Internet – Includes a variety of

knowledge objects in the cognitive structure of a document

– Content begins life of its own 6

Internet Protocol (TCP/IP) introduced 1982

ARPANET

Hypertext Transfer Protocol (HTTP) Hypertext Markup Language (HTML)

World Wide Web

Web browsers: Mosaic & Netscape

Partial map of the Internet on the January 15, 2005

See also Lumeta map (2006); http://internet-map.net/

https://upload.wikimedia.org/wikipedia/commons/3/3f/Internet_map_1024_-_transparent,_inverted.png

https://en.wikipedia.org/wiki/File:Internet_Hosts_Count_log.svg

https://web.archive.org/web/20060413075646/http:/blogs.cio.com/system/files?file=Internet_map_labels_0.pdf



http://internet-map.net/



Web applications

— What is the infrastructure

good for?

Key ideas: Vannevar Bush’s Memex

Vannevar Bush – engineer – WWII Headed U.S. Office of Sci Res & Dev’t (OSRD) – initiation and early administration of the Manhattan Project – 1945 Atlantic Monthly article “As we may think”

Memex – see Life Magazine take on it – Bush developed concept in 1930’s – Based on storing, indexing, & retrieving

microfilm images – Based on indexing textual/visual object

to one-another as the knowledge worker developed concepts

– Applied concept of “associative memory” to understand relationships of content objects (mapping of memory of an object against other objects)

– Also included ability to annotate all relationships or links

Basis for hypertext/hypermedia concept developed by Ted Nelson & Doug Engelbart 8

https://en.wikipedia.org/wiki/Vannevar_Bush



https://en.wikipedia.org/wiki/Manhattan_Project

http://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/

http://worrydream.com/refs/Bush - As We May Think (Life Magazine 9-10-1945).pdf

http://www.dougengelbart.org/firsts/hypertext.html

Invention of the World Wide Web

Tim Berners–Lee (1989-91) – Hypertext as an organizational knowledge management system for

preserving & managing knowledge at CERN “a ‘web’ of notes with links (like references) between them is far more

useful than a fixed hierarchical system… to allow a place to be found for any information or reference which one felt was important, and a way of finding it afterwards” [1990. Information Management: A Proposal]

– Concept included application independent standards for HTML – markup tags to encode document formats & components defined

using a simple SGML document type description

HTTP – a request-response protocol implemented in the client-server computing model

URL – (1992-4) a way to express and locate the unique address for a file that is accessible on the Internet

– Two types of applications give life to the standards Browser – end-user ‘client’ application for retrieving, presenting and

traversing information resources on the World Wide Web

Web server – system storing, processing and delivering web pages to clients via HTTP 9

http://web.archive.org/web/20130302075552/http:/www.w3.org/People/Berners-Lee/



https://en.wikipedia.org/wiki/Hypertext

http://www.w3.org/History/1989/proposal-msw.html

https://en.wikipedia.org/wiki/HTML

https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol

https://en.wikipedia.org/wiki/HTML

https://en.wikipedia.org/wiki/Web_browser

https://en.wikipedia.org/wiki/World_Wide_Web

https://en.wikipedia.org/wiki/Web_server

The Web transforms a communications infrastructure into a knowledge repository

Application independent standards for use by anyone Authoring tools to create content

– Text editors - SGML, HTML, XML are all expressed in ASCII characters so can be written using any character-based editor

– WYSIWYG editors try to show what the page will look like – Structure editors show logical structure as well as WYSIWYG

Web servers to provide content – Single PC in a home office

– Server farms, e.g., Google probably has more than 2 M servers

Browsers , e.g., – NCSA Mosaic (1993) – Netscape Navigator (1994) Firefox (2002) – Windows Explorer (1995) – Apple Safari (2003)

– Google Chrome (2008)

Search & retrieval engines 10

C:/Program Files (x86)/Amaya/WindowsWX/bin/amaya.exe

http://en.wikipedia.org/wiki/Server_farm

http://www.google.com/about/datacenters/gallery/

https://en.wikipedia.org/wiki/Mosaic_(web_browser)

https://en.wikipedia.org/wiki/Netscape_Navigator

https://en.wikipedia.org/wiki/Firefox

https://en.wikipedia.org/wiki/Internet_Explorer

https://en.wikipedia.org/wiki/Safari_(web_browser)

https://en.wikipedia.org/wiki/Google_Chrome

Content is useless if it cannot be found

Discovery tools & retrieval tools are essential – Web directories were initially important but now essentially extinct

Generally human curated catalogs of websites organized by some conceptual categorization, e.g., DMOZ, Yahoo Directory

Labor intensive and difficult to administer

– Automated search engines – technologically complex, vastly powerful Web crawler visits linked web pages under control of policy to collect

metadata and content for indexing Indexing engine indexes web pages by content, metadata, and perhaps

other factors such as numbers of ingoing and outgoing links according to search engine specific policy

Query processing applies input from user against actively maintained indexes to identify relevant web pages and returns links to these pages to the user.

Rise and fall of the web portals – Attempt to syndicate and provide access to range of information

retrieval & display tools via a single “easy to use” web page (e.g., Yahoo, Bigpond)

– For search, simplicity (e.g., Google), won the day – Portal technology still provides front-ends to corporate intranets 11

https://en.wikipedia.org/wiki/Web_directory

https://en.wikipedia.org/wiki/Web_directory

https://en.wikipedia.org/wiki/Web_search_engine

https://en.wikipedia.org/wiki/Web_crawler

https://en.wikipedia.org/wiki/Search_engine_indexing

https://en.wikipedia.org/wiki/Web_search_query

http://www.yahoo.com/

http://media.telstra.com.au/home.html

http://www.google.com/au

Search engines and web portals were the

killer applications that caused the Web

to explode

Fuel for explosive infrastructure growth

Web (and Internet) highly subsidized by the US government – Communications infrastructure

– Storage

major fractions of the knowledge being placed in the Web were freely available to end users

Fuelled by the growing epistemic value of the content that can be retrieved essentially for free, the Internet's rate of growth was unprecedented in human history

– soon grew beyond anything that was economically capable of gov’t support

Rise of the commercial (ISP)

– Similar organization and fees to commercial telecoms

– Web access common as phones 13

https://commons.wikimedia.org/wiki/File:Internet_Connectivity_Distribution_&_Core.svg

Early growth of the Internet and Web

14

Date Hosts 1

Domains 2

WebSites WHR(%)3

1969 4

Jul 81 210

Jul 89 130,000 3,900 –

Jul 92 4 992,000 16,300 50 0.005

Jul 93 1,776,000 26,000 150 0.01

Jul 945 3,212,000 46,000 3,000 0.1

Jul 95 6,642,000 120,000 25,000 0.4

Jul 96 12,881,000 488,000 300,000 2.3

Jul 97 19,540,000 1,301,000 1,200,000 6.2

Jan 98 29,670,000 2,500,000 2,450.000 8.3

Jul 98 36,739,000 4,300,000 4,270,000 12.o

Jul 01 126,000,000 30,000,000 28,200,000 22.0

Gromov 2011

Experimental HTML

Launch public Web

1A host is a domain name having an IP address record associated with it

2A domain is a domain name that has name server (NS) records associated with it and subdomains or hosts within the global domain.

WebSites are specifically HTTP servers for HTML & other objects.

3Web sites to Hosts Ratio – roughly estimates the percent of Web surfing people that are trying to become the Web authors by creating their own Web sites.

https://en.wikipedia.org/wiki/IP_address

https://en.wikipedia.org/wiki/Name_server

https://en.wikipedia.org/wiki/Web_server

Phenomenal growth

Some numbers (Witiger.Com) – Number of Internet devices:

1984 1,000 (one thousand)

1992 1,000,000 (one million)

2008 1,000,000,000 (one billion)

– To reach 50,000,000 (fifty million) users it took the Telephone 38 years

Television 13 years

Internet 4 years

iPod 3 years

Facebook 2 years

15

http://web.archive.org/web/20130425120956/http:/www.witiger.com/ecommerce/ecommercestatistics.htm

How much knowledge held in the Web?

My primary interest is meaningful “content” (web pages, documents, books), not data

Three Webs – Surface web –freely accessible to a browser

Inktomi Jan 2000 1,000,000,000 pages

Notess (2006) Dec 2000 600,000,000 Dec 2001 1,500,000,000 Nov 2002 3,000,000,000 Feb 2004 4,000,000,000 2006 20,000,000,000

Wikipedia current 36,607, 000 (~4 M for content) Google (2008) Jul 2008 1,000,000,000,000 (w/o duplicates)

Indexed Web current ~47,000,000,000 (Google) Web Archive current 8,083,803 (books & texts)

– Deep/hidden Web – requires subscription or password to access, e.g. e-Journals: University of Melbourne Library accesses 116,279

– Some are available free to the web, most are not (Scholar indexes)

e-Book titles on Amazon: 6,911,733; (437,674 are free, rest are not) Subscription news, financial reports, other databases, etc.

– Dark Web – encrypted & deeply hidden content (TOR, privacy, hacking, …) See Dr Gareth Owen 2015 Tor: Hidden Services and Deanonymisation

Quantification difficult (~80% of access seems to be child abuse porn)

16

http://web.archive.org/web/20020810221129/http:/www.inktomi.com/about/

http://www.searchengineshowdown.com/features/google/review.html

https://en.wikipedia.org/wiki/Special:Statistics

http://googleblog.blogspot.com.au/2008/07/we-knew-web-was-big.html

http://www.worldwidewebsize.com/



https://archive.org/details/texts

https://en.wikipedia.org/wiki/Deep_Web_(search_indexing)

https://www.torproject.org/

https://www.youtube.com/redirect?q=http://media.ccc.de/browse/congress/2014/31c3_-_6112_-_en_-_saal_2_-_201412301715_-_tor_hidden_services_and_deanonymisation_-_dr_gareth_owen.html&redir_token=c_iFQaxd3FY950zUALZ7b5y4WvJ8MTQzNTczMzYxOUAxNDM1N

https://www.youtube.com/redirect?q=http://media.ccc.de/browse/congress/2014/31c3_-_6112_-_en_-_saal_2_-_201412301715_-_tor_hidden_services_and_deanonymisation_-_dr_gareth_owen.html&redir_token=c_iFQaxd3FY950zUALZ7b5y4WvJ8MTQzNTczMzYxOUAxNDM1N

Some other uses of the Web/Internet

Blogs (WordPress; Blogger)

Cloud apps (Google Docs; Office 365)

eCommerce (Kogan, eStore, Coles Online)

Entertainment media (e.g., Netflix; Foxtel)

Navigation & geolocation (e.g., Google Earth, Nearmap)

News media (e.g., Google News, Huffington Post, CNN)

Photography & Video (e.g., Flickr, Panoramio)

Self storage (Dropbox, Google Drive)

Sex/pornography

Social networking (e.g., Facebook; Twitter ; Meetup; LinkedIn)

Telephony & teleconferencing (Skype; Webex)

Video sharing (YouTube, Vimeo)

17

https://wordpress.com/website/

https://www.blogger.com/features

https://docs.google.com/document/u/0/?pli=1&showDriveBanner=true

https://products.office.com/en-au/mobile/office

https://www.kogan.com/au/

http://www.estore.com.au/

http://shop.coles.com.au/online/vic-metro-bacchus-marsh/

https://www.netflix.com/au/

http://www.foxtel.com.au/index.html

https://www.google.com/earth/

http://au.nearmap.com/

https://news.google.com.au/

https://news.google.com.au/

http://edition.cnn.com/

https://www.flickr.com/

http://www.panoramio.com/

https://www.dropbox.com/

https://drive.google.com/

https://en.wikipedia.org/wiki/Facebook

http://www.webex.com/

https://en.wikipedia.org/wiki/Meetup_(website)

https://en.wikipedia.org/wiki/LinkedIn

https://en.wikipedia.org/wiki/Skype



Some thoughts on the history, what it

means, and where it is taking us

Why has the Web been so overwhelmingly successful

Tony Smith (1995) “Why the Web” (before I knew it existed) – modest extra layer on established & working technologies

– Developers worked in real world with open collaboration

– URL is human-readable and printable way to address any Internet resource

– Climate for the Web established by a succession of grand visions

– Marc Andreessen built a user-friendly graphical interface

– Newbies rapidly found the Web effectively eliminated distribution and publication costs for desk-top publishing

Puts the evolutionary growth of knowledge into hyperdrive

19

The World Wide Web links a vast network of … actors, human, non-human, material and ethereal. The six above-listed causes of the Web’s success dance with those actors across a profusion of interconnections. The ideas of human visionaries become memes propagating an epidemic of Web ‘surfing’. The Web’s computer codes become epidemic across the Internet. Loops in the Web’s links, and in its actor-network, feed back positively and cybernetically—fuelling its continued near exponential growth and its ever-accelerating transformation into cyberspace proper. [Smith 1995]

http://www.meme.com.au/papers/WTW/

Where is the Web likely to go in the future

Trends – Ubiquity is almost here now

– Increasing epistemic power Web applications are making more and more decisions on their own

before consulting their human users

Able to make decisions with ever increasing information sources

– Generalization/convergence more and more functions incorporated in single applications

e.g., Google Earth/Maps as a geolocated memory prosthesis

Future – Increasing replacement (not extension) of human cognitive functions

E.g., spatial navigation

E.g., memory and recall (life-logging?)

– Emergent functions Global brain?

– Burnout?

20

Next session

Wrapping up the Web – I’ve already covered concepts fro most of the book sections listed

below in earlier Meetup sessions

– Here I’ll say a bit more about how the technology carries out cognitive processes in the Web

– I hope Tony will join me in a free-form discussion of Web history and our experiences with it

– There is a lot more to be said about human interactions with the technology and the Web, but that will be left until after an Interlude where I take a much deeper look from physical and evolutionary points of view at the emergence and interrelations of life and knowledge

21

Episode 3(4) - Emerging cognition in the Web itself Retrieving Value from the Web Semantically

Cataloging Approaches

Indexing Approaches

Using Portals

Multimedia

Wrapping Up the Web

Episode 3(3): Birth & explosion of the World Wide Web - Meetup session11

Education