Top Banner
Content and Systems Week 3
27

Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

Dec 14, 2015

Download

Documents

Rosemary Sims
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

Content and Systems

Week 3

Page 2: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

Today’s goals

• Obtaining, describing, indexing content– XML– Metadata

• Preparing for the installation of Dspace– Computers available– User names and passwords

• Will come from Mr. Nadi this week, once he knows the team configurations

– Access• I believe you all have access to Mendel 290.

Please confirm.

Page 3: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

The Digital Library Content

• Essential elements for a digital library– Users– Content– Services

Page 4: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

Content - requirements

• Store– Organize– Describe

• Find

• Deliver

Page 5: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

Describing the content

• How to describe content– Metadata

• Machine readable description of anything

• What description– Machine readable requires standard descriptive

elements• Dublin Core (http://dublincore.org/)

– International standard– “a standard for cross-domain information resource description.”– 15 descriptive elements

• Other metadata schemes– IEEE-LOM

Page 6: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

Metadata

• What does metadata look like?

• Metadata is data about data– Information about a resource, encoded in

the resource or associated with the resource.

• The language of metadata: XML– eXtensible Markup Language

Page 7: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

XML

• XML is a markup language

• XML describes features

• There is no standard XML

• Use XML to create a resource type

• Separately develop software to interact with the data described by the XML codes.

Source: tutorial at w3school.com

Page 8: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

XML rules

• Easy rules, but very strict• First line is the version and character

set used: – <?xml version="1.0" encoding="ISO-8859-1"?>

• The rest is user defined tags• Every tag has an opening and a closing

Page 9: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

Element naming

• XML elements must follow these naming rules:– Names can contain letters, numbers, and other

characters– Names must not start with a number or

punctuation character– Names must not start with the letters xml (or XML

or Xml ..)– Names cannot contain spaces

Page 10: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

Elements and attributes

• Use elements to describe data

• Use attributes to present information that is not part of the data– For example, the file type or some other

information that would be useful in processing the data, but is not part of the data.

Page 11: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

Repeating elements

• Naming an element means it appears exactly once.

• Name+ means it appears one or more times

• Name* means it appears 0 or more times.

• Name? Means it appears 0 or one time.

Page 12: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

Parts of an XML document

• Elements– The components of an XML document– Some contain other parts, some are empty

• Ex in HTML: “br” or “table” in XML “ingredient”

• Attributes– Information about elements, not data

• Ex in HTML “src=” in XML “scale=”

• Entities– Special characters or strings with pre-assigned meaning

• Ex in HTML &nbsp for non-breaking space

• PCDATA– Parsed Character data: text that will be parsed and interpreted by

the reader. Tags and entities will be expanded and used in presentation.

• CDATA– Character data: text that will not be parsed and interpreted. It will

be displayed exactly as provided.

The HTML examples are familiar; the XML examples are made up – dependent on the specific XML scheme used

Page 13: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

Using XML - an exampleDefine the fields of a recipe collection:<?xml version="1.0" encoding="ISO-8859-1"?><recipe><recipe-title> </recipe-title><ingredient-list> <ingredient> <ingredient-amount> </ingredient-amount> <ingredient-name> </ingredient-name> </ingredient></ingredient-list><directions></directions></recipe> ISO 8859 is a character set.

See http://www.bbsinc.com/iso8859.html

Page 14: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

Processing the XML data

• How do we know what to do with the information in an XML file?– Document Type Definition (DTD)

• Put in the same file as the data -- immediate reference

• Put a reference to an external description• Provides the definition of the legitimate content

for each element

Page 15: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

Document Type Definition• <?xml version="1.0" encoding="ISO-8859-1"?>• <!DOCTYPE recipe [• <!ELEMENT recipe (recipe-title, ingredient-list, directions)>• <!ELEMENT recipe-title (#PCDATA)>• <!ELEMENT ingredient-list (ingredient)>• <!ELEMENT ingredient (ingredient-amount, ingredient-name)*>

• <!ELEMENT ingredient-amount (#PCDATA)>• <!ELEMENT ingredient-name (#PCDATA)>• <!ELEMENT directions (#PCDATA)> ]>

Repeat 0 or more times

Page 16: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

<?xml version="1.0" encoding="ISO-8859-1"?><!DOCTYPE recipe SYSTEM “recipe.dtd”><recipe><recipe-title> Meringue cookies</recipe-title><ingredient-list> <ingredient> <ingredient-amount>3 </ingredient-amount> <ingredient-name> egg whites</ingredient-name> </ingredient> <ingredient> <ingredient-amount> 1 cup</ingredient-amount> <ingredient-name> sugar</ingredient-name> </ingredient> <ingredient> <ingredient-amount>1 teaspoon </ingredient-amount> <ingredient-name> vanilla</ingredient-name> </ingredient> <ingredient> <ingredient-amount>2 cups </ingredient-amount> <ingredient-name>mini chocolate chips </ingredient-name> </ingredient></ingredient-list><directions>Beat the egg whites until stiff. Stir in sugar, then vanilla. Gently fold in chocolate

chips. Place in warm oven at 200 degrees for an hour. Alternatively, place in an oven at 350 degrees. Turn oven off and leave overnight.

</directions> </recipe>

Not the way that I want to see a recipe in a magazine!

What could we do with a large collection of such entries?

How would we get the information entered into a collection?

External reference to DTD

Page 17: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

XML exercise

• Design an XML schema for an application of your choice. Keep it simple.

• Examples -- address book, TV program listing, DVD collection, …

Page 18: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

Another example• A paper with content encoded with XML:

http://tecfaseed.unige.ch/staf18/modules/ePBL/uploads/proj3/paper81.xml

• First few lines:• <?xml version="1.0" encoding="ISO-8859-1"?>• <?xml-stylesheet href="ePBLpaper11.css" type="text/css"?>• <?xml-stylesheet href="ePBLpaper11.xsl" type="text/xsl"?>• <!DOCTYPE paper SYSTEM "ePBLpaper11.dtd">• <paper id="proj3">• <info>• <title>Standards E-learning and their possible support for a rich pedagogic approach in a• 'Integrated Learning' context</title>• <authors>• <author>• <firstname>Rodolophe</firstname>• <familyname>Borer</familyname>• <homepageurl>http://tecfa.unige.ch/perso/staf/borer/</homepageurl>• <email/>• </author>• </authors>

"ePBLpaper11.dtd” shown on next slide

Page 19: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

<?xml version="1.0" encoding="ISO-8859-1" ?><!-- _________ _____________________ --><!-- ePBL-project DTD for student project

management & specification --><!-- Copyright: (2004)

[email protected] --><!-- http://tecfa.unige.ch/~paraskev/ --><!-- Daniel K. Schneider --><!-- http://tecfa.unige.ch/tecfa-people/schneider.html--><!-- Created: 13/11/2002 (based on EVA_pm grammar) --

><!-- Updated: 07/05/2004 --><!-- VERSIONS --><!-- v1.1 Adaptations to use with Morphon xml

editor and addition of IDs--><!-- ____________________ --><!-- _ ENTITY DECLARATIONS ______ --><!ENTITY % foreign-dtd SYSTEM "ibtwsh6_ePBL.dtd">%foreign-dtd;<!ENTITY % id "id ID #IMPLIED"><!-- ______ MAIN ELEMENT _________ --><!ELEMENT project (name, authors, date,

updated, goal, state-of-the-art, research-development-questions, methodology, workpackages ) >

<!ELEMENT name (#PCDATA )><!ELEMENT date (#PCDATA )><!ELEMENT authors (#PCDATA )>

<!ELEMENT updated (#PCDATA )><!ELEMENT goal (title, description )><!ELEMENT state-of-the-art %vert.model;><!ATTLIST state-of-the-art %id;><!ELEMENT research-development-questions

(question )+>

<!ELEMENT question (title, description )><!ELEMENT methodology %vert.model;><!ATTLIST methodology %id;><!ELEMENT workpackages (workpackage )+><!ELEMENT workpackage (planning, objectives,

deliverables )><!ATTLIST workpackage %id;><!ELEMENT objectives (objective )+><!ELEMENT objective (title, description )><!ELEMENT deliverables (deliverable )+><!ELEMENT deliverable (url, title, description )><!ELEMENT url (#PCDATA )><!ELEMENT planning (from, to, progress )><!ELEMENT from (#PCDATA )><!ELEMENT to (#PCDATA )><!ELEMENT progress (#PCDATA )><!-- ________________________ --><!ELEMENT title (#PCDATA )><!ATTLIST title %id;><!ELEMENT description %vert.model;><!-- _______________________ -->

Source: http://tecfa.unige.ch/staf/staf-j/vuilleum/staf18/p6/

Page 20: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

Vocabulary

• Given the need for processing, do you want free text or restricted entries?

• Free text gives more flexibility for the person making the entry

• Controlled vocabulary helps with– Consistent processing– Comparison between entries

• Controlled vocabulary limits– Options for what is said

Page 21: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

Vocabulary example

• Recipe example– What text should be controlled?– What should be free text?

• Ingredients– Ingredient-amount– Ingredient-name– Should we revise how we coded ingredient

amount?

• Directions

Page 22: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

Dublin Core

• Standard set of metadata fields for entries in digital libraries:– Title, creator, subject, description,

publisher, contributor, date, type, format, identifier, source, language, relation, coverage, rights

Page 23: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

Dublin Core elementssee: http://dublincore.org/documents/dces/

• Title• Creator • Subject - C• Description• Publisher• Contributor• Date • Type - C

• Format - C• Identifier• Source• Language• Relation• Coverage - C

• Rights Rights Management information

Space, time, jurisdiction.

C = controlled vocabulary recommended.

Ref. to related resource

Standards RFC 3066,

ISO639

Unambiguous ID

Ex: collection, dataset, event, image

YYYY-MM-DD, ex.

Entity primarily responsible for making content of the resource

Entity making the resource available

Contributor to content of the resource

What is needed to display or operate the resource.

Page 24: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

A DSpace example

• CITIDEL:

http://citidel.villanova.edu

Page 25: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

IEEE - LOM

• Example of a specialized metadata scheme– Learning Object Metadata

• Specifically for collections of educational materials

• Includes all of Dublin Core• See http://projects.ischool.washington.edu/sasutton/IEEE1484.html

Page 26: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

Computing systems• Linux machines• Introduction to unix:

http://www.csc.villanova.edu/~lab/unix/• Dspace: http://www.dspace.org/

– Documentation, including installation - http://www.dspace.org/index.php?option=com_content&task=view&id=151&Itemid=116

• Najib Nadi, our system administrator, is setting up the machines. He will send a message to the class by the middle of the week with details of machine location and login.

Remember - you have the option to use your own machine, but must meet the criteria described last week.

Page 27: Content and Systems Week 3. Today’s goals Obtaining, describing, indexing content –XML –Metadata Preparing for the installation of Dspace –Computers available.

This session

• Defined meta data and its role in digital libraries.

• Introduced XML as a language for describing a collection of content.

• Described the computing resources and how to get ready for the first DL setup.