Top Banner
Introduction to Computer Networks 2004, 劉劉劉
49

Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Introduction to Computer Networks

2004, 劉震昌

Page 2: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Review of Lab#2 and Homework#1

“Lab” means “Laboratory”, not “Label”. Algorithm steps must be executed in

turn. You can not skip any step on your own decision. Why?

Please write your homework subject correctly

No delay for homework

Page 3: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Outline Origins of the Internet 網際網路的發源 Origins of the WWW (World Wide Web)

HTML (Hypertext Markup Language 超文件標示語言 ?) guide

Searching the Web Search engine (Web browser 網路瀏覽器 ) Web directories

Page 4: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Origins of the Internet

Ref: Chap.2 on Comer’s book

Page 5: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Origins of the Internet

In 1969, US DoD’s ARPA(Advanced Research Projects Agency) built the ARPANET Only 4 nodes De-centralized system Data transmission 參考網站

Page 6: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Origins of the Internet (cont.)

1974, TCP/IP was developed and later became a standard in 1983 TCP(Transmission Control Protocol) IP(Internet Protocol) 網路通訊協定的重要性

Growth of ARPANET --> Internet Internetworking No organization owns or controls it

Page 7: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

no. of computers

Growth of the Internet

1M = 1,000,000

Page 8: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

計量單位 http://www.spes.tpc.edu.tw/handouts/B_Basic/ref.

htm

Page 9: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

log scale

Almost exponential growth

Recently ignited by WWW and economical activities

指數成長

Page 10: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

IP Service

Where is your computer on Internet ? Current internet (IPv4)

32 bits to represent an IP address Ex. 163.22.20.129 What is your computer’s IP address? ipconfig

163.22.20.129

163.22.20.118

163.22.22.119

Page 11: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Address Resolution Protocol (ARP)

IP protocol address is an abstraction; physical network hardware does not know how to locate the computer from IP address

Techniques table look-up closed form computation message exchange

Page 12: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Computers on the Net

Every Internet host has a unique IP address, however, it is hard to remember. So we have host name e.g., arbor.watson.ibm.com is 9.2.13.20 and ar

bor.ee.ntu.edu.tw is 140.112.21.236 Try: nslookup

Page 13: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Domain Name Server 網域名稱伺服器

Host name is to be converted into IP address

Domain Name Servers (DNS) containing a database (look-up table) for host

name to IP address mapping there are many domain name servers “.com”, “.gov”, “.edu”, “.tw”

Page 14: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Lab#3 Use the commands

ipconfig nslookup

Page 15: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Internet application telnet: A terminal emulation program

for TCP/IP networks such as the Internet

ftp (file transfer protocol)

telnet163.22.22.119

163.22.22.119(Run telnet server)

Page 16: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Origins of WWW

Ref: Chap. 32 on Comer’s book

Page 17: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Outline Origins of WWW(World Wide Web) Web browser HTML(Hyper-Text Markup Language) HTTP(Hyper-Text Transfer Protocol)

Page 18: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Origins of WWW

World Wide Web(WWW) Proposed in 1989, by Tim Berners-Lee at

CERN(European Particle Research Center) A large-scale, online repository of

information Develops interoperable technologies

(specifications, guidelines, software, and tools)

Currently, there is a W3C (WWW consortium) doing these things

Page 19: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Origins of WWW (cont.) Data format: HTML (HyperText Markup L

anguage) Allow hypertext link (URL: Universal Resource

Locator) to other documents on Web

Protocol: HTTP (HyperText Transfer Protocol)

Data exchange standard on Web 資料交換的共通格式與傳輸協定

Protocol://computer_name:port/document_name

Page 20: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Origins of WWW (cont.)

Internet

URLsWWW

就像一個大的資料庫分佈在 Internet 上

Page 21: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Web browser tools to read HTML document

Web browser Web server(ex. 跑 IIS)

client server

click a link send requestfind document

return HTML documentdisplay

Connection terminated after receiving all items

Page 22: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Web browser (cont.) Text mode browser: lynx

lynx http://www.csie.ncnu.edu.tw Graphics mode browser

NCSA(National Center for Supercomputing Applications) Mosaic by Marc Andreeson

Netscape IE

Page 23: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Web browser (cont.)Browser architecture

Page 24: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Document representation Hypertext: textual information Hypermedia: additional info., like images a

nd graphics HyperXXXX: an abstract idea

A set of documents, and a document can contain pointers to other documents

Page: a hypermedia document on the Web

Page 25: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Hypertext Markup Language (HTML)

Markup Language: publishing hypertext in a less detailed format

HTMLdocument

display resultsmay be different

Page 26: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

HTML Text file + tags Tags: formatting the document <Tagname>…text…</Tagname>

Page 27: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

HTML layout

<HTML> <HEAD> <TITLE> ….title of the text…. </TITLE> </HEAD> <BODY> …body of the document… </BODY></HTML>

* 良好的縮排便於人類理解編輯

Page 28: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

HTML layout (cont.)

<HTML><HEAD><TITLE>….title of the text….</TITLE></HEAD><BODY>…body of the document…</BODY></HTML>

Page 29: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

HTML examples Example1 Example2 Example3: embedding images Example4: hypertext link(anchor 錨 )

<a> ….anything…</a> Any item can have a hypertext link

Lab#4 in the afternoon http://www.csie.nctu.edu.tw/~jglee/teacher/content.

htm

Page 30: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

HTTP documents See http://ftp.ics.uci.edu/pub/ietf/http/ HTTP/1.0, RFC 1945, 1996 HTTP/1.1, RFC 2068, 1997

Page 31: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Searching the Web

Ref: Chapter 13 in “Modern Information Retrieval”

Ricardo Baeza-Yates and Berthier Ribeiro-Neto

Page 32: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Outline Measuring the Web Methods for searching the Web

Search engines Web directories

Page 33: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Searching the Web WWW starts in 1989 Just the textual data is estimated to be

in the order of one terabyte Goal: how to efficiently manage,

retrieve and filter information from the Web?

Page 34: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Challenges Distributed data

Data spans over many computers interconnected without predefined topology

High percentage of volatile data 易變資料 40% of the Web changes every month

Large volume Unstructured and redundant data 重複資料

30% of Web pages are (near) duplicates Heterogeneous data

Different languages

Page 35: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Measuring the Web

Internet

URLsWWW

Webserver

*1998, 3M servers

No. of servers =1/10 no. of computers on Internet

3 百萬

Page 36: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Measuring the Web (cont.) 1998 5Kb per Web page on average 300M Web pages (3 億… ) 300M * 5Kb = 1.5 Terabytes Grow at a rate of 20M pages per month

Page 37: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Growth of the Web

1996 1997 1998

100

200

300

Webpages Web

sites

Million

year

Page 38: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Methods for searching the Web

Search engines 搜尋引擎 Index the Web documents as a full-text d

atabase Alta Vista, Google, …

Web directories 入門網站目錄 Classify selected Web documents by subj

ect Yahoo!

Page 39: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Search engines搜尋引擎

Model the Web as a database All queries must be answered without

accessing the Web pages

Userqueries database

Page 40: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Search engines (cont.) AltaVista (www.altavista.com)

20 multi-processor machines 130 Gb of RAM each Over 500 Gb of disk space each 75% resources on the query engine

Page 41: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

The top search engines Foreign

Google ( www.google.com ) www.yahoo.com www.altavista.com Inktomi ( www.inktomi.com ) Statistics on search engines

www.searchenginewatch.com http://imt.net/~notess/search

Taiwan Yahoo!/Kimo uses google Openfind ( www.openfind.com.tw )( 中正大學吳昇教授 ) Yam ( www.yam.com.tw )

Page 42: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Search engines (cont.) Centralized crawler-indexer

architecture

UserInterface

QueryEngine

Indexdatabase

users

Indexer

Crawler

Web

Page 43: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

User Interface

Query interface Keywords Boolean operator

Answer interface Rank the searched pages

Statistics about the term occurrence within the document

Popularity Hyperlink information

Page 44: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

UserInterface

QueryEngine

Indexdatabase

users

Indexer

Crawler

Web

Page 45: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Crawler Robots, spiders( 蜘蛛 ), wanderers, wal

kers, and knowbots Inspite of their name, the crawler runs

on a local system and sends requests to remote Web servers

Method: start with a set of URLs, and from there extract other URLs

Page 46: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Crawler (cont.)

How the Web is traversed, the index of a search engine can be thought as analogous to the stars in a sky Invalid links in search engines vary from

2% to 9% The current fastest crawlers are able

to traverse up to 10M Web pages per day 300M/10M = 30 days

Page 47: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Web directories 網站目錄 Classify the Web pages by categories Directories are hierarchical taxonomies

that classify human knowledge Yahoo! has close to 1M pages classified How to classify pages?

Pages has to submitted to the Web directories

Manually done by few people Automatic classification is not yet mature Not every page is classified

Page 48: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

Some Web directories

Web directories URL Web sites(K) Categories

Yahoo! www.yahoo.com 750LookSmart www.looksmart.com 300 24Lycos Subjects a2z.lycos.com 50eBLAST www.eblast.com 125NewHoo www.newhoo.com 100 23Magellan www.mckinley.com 60Netscape www.netscape.com Snap www.snap.com

Page 49: Introduction to Computer Networks 2004, 劉震昌. Review of Lab#2 and Homework#1 “ Lab ” means “ Laboratory ”, not “ Label ”. Algorithm steps must be executed.

The power of search engine

I have found a homepage that contains the solutions to the C textbook!!!

Who find the homepage and sends me email first will get a bonus point…