Top Banner
2004.12.09 - SLIDE 1 IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2004 http://www.sims.berkeley.edu/academics/courses/ is202/f04/ SIMS 202: Information Organization and Retrieval
33

2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 1IS 202 – FALL 2004

Lecture 29: Final Review

Prof. Ray Larson & Prof. Marc Davis

UC Berkeley SIMS

Tuesday and Thursday 10:30 am - 12:00 pm

Fall 2004http://www.sims.berkeley.edu/academics/courses/is202/f04/

SIMS 202:

Information Organization

and Retrieval

Page 2: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 2IS 202 – FALL 2004

Lecture Overview

• Final Exam

• Final Review

• Course Evaluations

• Phone Details

• Next Steps

Page 3: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 3IS 202 – FALL 2004

Lecture Overview

• Final Exam

• Final Review

• Course Evaluations

• Phone Details

• Next Steps

Page 4: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 4IS 202 – FALL 2004

Final Exam Details

• Date: December 14 Time: 9:30-12:30• The exam is open-book, open note AND open

computer• You may use your own laptop, or one of the

computers in the lab• The results will need to be printed• It can be handwritten if you wish, if so be sure to

bring pens, pencils, and erasers• It is essential that you have access to and/or

bring your final facetted classification so that you can analyze it and use it

Page 5: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 5IS 202 – FALL 2004

Final Exam Details

• There will be 8 questions on the exam– Some questions have multiple parts

• One of the questions will be taken from the Discussion Questions you submitted in class

• Questions will be worth a specific number of points and these will be stated on the exam itself

• Partial credit will be awarded for partial answers, so we advise that you do not skip any questions

• In your answers, please balance conciseness with illustration of all of the requested information– In other words, don't write a lot of things that aren't

asked for, but try to address all of what is asked for

Page 6: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 6IS 202 – FALL 2004

Final Exam Details

• The exam will be comprehensive, covering both the Information Organization and Retrieval parts of the course– The emphasis will be on the last half (Organization)

(about 70/30 bias towards the last half)

• Each person will work individually• The exam period is three hours

– You will likely need the entire time

• If you use network-accessed material for any part of the exam be sure to cite your sources

Page 7: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 7IS 202 – FALL 2004

Study Guide

• Be sure you understand the material that was covered in lectures and have read and understood the assigned readings

• Be sure you can do activities similar to what was done in the homework assignments

Page 8: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 8IS 202 – FALL 2004

Study Guide

• We will have questions that require you to generalize from what you've learned and synthesize ideas– So be sure you have thought about the ideas

covered in lectures, readings, and homework assignments

• These ideas and abilities should be at your fingertips– There won't be time during the exam to do a lot

of catch-up reading on topics you haven't studied

Page 9: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 9IS 202 – FALL 2004

Example Questions

• These are available on the Class Web site• Note that these examples are NOT the exact

questions that will be on the exam but are similar to questions that have been used in the past

• If you have actively participated in the phone project assignments from the last part of the course and are familiar with the facetted classification you designed and built, this will greatly help you on at least 30% of the final exam

Page 10: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 10IS 202 – FALL 2004

Review of Course Content

• We can draw on:– All sets of slides (including this one)– The Course Readers– Textbooks– Handout papers– Assignments– Discussion questions and issues

Page 11: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 11IS 202 – FALL 2004

Lecture Overview

• Final Exam

• Final Review

• Course Evaluations

• Phone Details

• Next Steps

Page 12: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 12IS 202 – FALL 2004

Course Schedule

• Organization– Categorization– Knowledge Representation– Lexical Relations and WordNet– Controlled Vocabularies

Introduction– Phone Project Introduction– Semantic Web and RDF– Facetted Classification– Thesaurus Design and

Construction– Metadata Standards– Multimedia Information

Organization and Retrieval– Metadata for Media– Mobile and Context-Aware

Multimedia Systems– Phone Project Presentations– Future of Information Systems

• Retrieval– Overview– What is Information?– History of Information Systems– Introduction to the Search

Process– Boolean Queries and Text

Processing– Web Search Issues and

Architecture– Implementing Web Site Search

Engines– Statistical Properties of Text

and Vector Representation– Probabilistic Ranking &

Relevance Feedback– Evaluation– Database Design

Page 13: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 13IS 202 – FALL 2004

Your Questions

• What topics and/or questions would you like to discuss today?

Page 14: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 14IS 202 – FALL 2004

Information Retrieval Topics

• Information• Document Representation and Statistical

Properties of Text• Queries, Ranking, and the Vector Space Model• IR systems and Implementation• Evaluation of IR Systems• The Search Process and User Interfaces• Relevance Feedback• Database Design

Page 15: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 15IS 202 – FALL 2004

Information Retrieval Topics

• Information– What is the information life cycle? – What are different ways of measuring

information? What are different ways of defining information?

• Document Representation and Statistical Properties of Text– What is the significance of Zipf's law for

weighting of terms in information retrieval? – What kinds of errors can a stemming

algorithm produce?

Page 16: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 16IS 202 – FALL 2004

Information Retrieval Topics

• Queries, Ranking, and the Vector Space Model – What is the difference between a search engine that uses the

vector space ranking algorithm on natural language queries and a system that uses Boolean queries?

– What is the role of coordination level ranking in a facetted Boolean system?

– Describe the following information need in terms of a faceted Boolean query. What kinds of weighting algorithms can be applied to a faceted query like this? “I would like to find articles about the effects of the passage of the independent investigator statute by Congress on how the U.S. president chooses an attorney general.''

– Why do different web search engines return different sets of documents for the same query?

– Redo the computations of Assignment 3 part 3 using different values for TF.

Page 17: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 17IS 202 – FALL 2004

Information Retrieval Topics

• IR systems and Implementation– Draw and label a diagram that shows the major components of

an IR system. – What are the special features of the Cheshire II information

access system? – What is the purpose of an inverted index? How is it used to

generate answers to Boolean queries? – Convert the contents of a set of documents (short texts) into an

inverted index representation. • Evaluation of IR Systems

– Define precision. Define recall. Define relevance. How are the three interrelated?

– Under what circumstances is high recall desirable? Under what circumstances is high precision?

– What is the main purpose of TREC? How does it differ from earlier evaluation efforts?

Page 18: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 18IS 202 – FALL 2004

Information Retrieval Topics

• The Search Process and User Interfaces– Search and retrieval is part of a larger

process. Name some other components of that process.

– How/why doesn't the Bates berry-picking model fit with the standard information retrieval model?

– How (fundamentally) does search on a directory system like Yahoo differ from search on Altavista or Google?

Page 19: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 19IS 202 – FALL 2004

Information Retrieval Topics

• Relevance Feedback – What is main the difference between relevance

feedback as defined in the literature and the more current web-based notion of "more like this"?

– Given a query, three documents marked as relevant, and the Rocchio formula for relevance feedback given in class, compute the vector for the new query that results.

– The Koenemann & Belkin study found results in three conditions for relevance feedback opaque, transparent, and penetrable. Consider the different ways people have recently implemented systems for predicting which web page to show the user next. How do the differences in these systems correspond to the different relevance feedback

Page 20: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 20IS 202 – FALL 2004

Information Retrieval Topics

• Database Design– How is a database different than a file system? – What are the benefits of a database system? – What do we mean by data independence? – What are the benefits/drawbacks of the

primary database models? – Entity-Relationship Diagrams -- what are they

for, how do you create them? – How do you normalize a relational model

database? – What is a join?

Page 21: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 21IS 202 – FALL 2004

Information Organization Topics• Categorization• Knowledge Representation• Lexical Relations and WordNet• Controlled Vocabularies• Semantic Web and RDF• Facetted Classification and Thesaurus Design and

Construction• Metadata Standards• Multimedia Information Organization and Retrieval• Metadata for Motion Pictures Media Streams and MPEG-7• Mobile and Context-Aware Multimedia Information Systems• Looking Backward Looking Forward Future of Information

Systems• Project Presentations

Page 22: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 22IS 202 – FALL 2004

Information Organization Topics• Categorization

– What is the definition of class membership in traditional categorization? How does traditional categorization have difficulty describing certain phenomena, like games (give 1 other example besides games)?

– What is the “basic level” in categorization and how is it psychologically primary? How might the use of basic level categorization affect the design and use of information systems?

• Knowledge Representation– What limitations in standard information retrieval do knowledge

representation technologies try to overcome? What challenges do they face in the attempt?

– What are the similarities and differences between commonsense knowledge representation systems like CYC and facetted metadata classifications like the Art and Architecture Thesaurus or the facetted classification you built (give three examples)?

Page 23: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 23IS 202 – FALL 2004

Information Organization Topics• Lexical Relations and WordNet

– What are three lexical relations in WordNet that would be useful in an information retrieval task (explain how and give examples)?

– Where are the meanings of the words in WordNet? How would assuming the conduit metaphor vs. the toolmakers’ paradigm of communication lead you to different answers to this question?

• Controlled Vocabularies– What does Svenonius consider to be the primary

difficulties with using controlled vocabularies? – What is the purpose of authority control? Is this a type

of controlled vocabulary? Why or why not?

Page 24: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 24IS 202 – FALL 2004

Information Organization Topics• Semantic Web and RDF

– What are the different basic topological structures of XML and RDF? What benefits and problems do these respective structures offer for information organization and retrieval?

– What is the Semantic Web effort trying to accomplish? What challenges does that effort face and how might they be overcome?

• Facetted Classification and Thesaurus Design and Construction– What are the differences between classical and

faceted classification and how do these differences affect the design and use of information systems?

– How is a classification scheme or a thesaurus designed?

Page 25: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 25IS 202 – FALL 2004

Information Organization Topics• Metadata Standards

– What are the motivations behind creating and using metadata systems like Dublin Core, MARC, AACR II, etc.?

– How do metadata standards come about and how might their provenance affect their adoption?

• Multimedia Information Organization and Retrieval– What is the “Kuleshov Effect” and how might it affect

the design of metadata for multimedia data?– What are the “semantic gap” and the “sensory gap”

and what challenges do they present for the design of information systems for multimedia data?

Page 26: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 26IS 202 – FALL 2004

Information Organization Topics• Metadata for Motion Pictures Media Streams and

MPEG-7– What limitations do keywords pose for multimedia

information retrieval and how might those limitations be addressed?

– What aspects of multimedia content description is MPEG-7 attempting to standardize?

• Mobile and Context-Aware Multimedia Information Systems– How are cameraphones distinguished from traditional digital

cameras in their technological capabilities and use (give 5 examples)?

– What and how could contextual metadata be useful in describing and retrieving information (give 4 examples)?

Page 27: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 27IS 202 – FALL 2004

Information Organization Topics• Looking Backward Looking Forward Future of

Information Systems– How are Bush’s vision of the Memex and the current

World Wide Web similar and different (explain two similarities and two differences)?

• Project Presentations – In revising your facetted metadata ontology how did

you increase its expressiveness and reusability (give 3 examples)?

– How well would the ontology you and your partner group designed support one of the other mobile media metadata applications presented by your classmates?

Page 28: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 28IS 202 – FALL 2004

Lecture Overview

• Final Exam

• Final Review

• Course Evaluations

• Phone Details

• Next Steps

Page 29: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 29IS 202 – FALL 2004

Course Evaluations

• Please take these seriously

• We and your colleagues really benefit from these in many ways– Affect our promotion and tenure

– Give us helpful feedback on what worked and what didn't to help us for next year and beyond

– They in no way affect your grade

Page 30: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 30IS 202 – FALL 2004

Lecture Overview

• Final Exam

• Final Review

• Course Evaluations

• Phone Details

• Next Steps

Page 31: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 31IS 202 – FALL 2004

Phone Details

• Use over break?

• Need roaming?

• Want GPS unit?

• Want to still get photos off the phone?

• Want to switch to primary cell phone number?

• Can bring in on Friday?

• Can bring in on Monday?

Page 32: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 32IS 202 – FALL 2004

Lecture Overview

• Final Exam

• Final Review

• Course Evaluations

• Phone Details

• Next Steps

Page 33: 2004.12.09 - SLIDE 1IS 202 – FALL 2004 Lecture 29: Final Review Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00.

2004.12.09 - SLIDE 33IS 202 – FALL 2004

Study hard, and good luck!

Thank you for all the great work!