Top Banner
1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963- 01 Week 9, April 5, 2011 Information management, workflow and discovery /check- in for project definitions
48

1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Jan 04, 2016

Download

Documents

Leslie Turner
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

1

Peter Fox

Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01

Week 9, April 5, 2011

Information management, workflow and discovery

/check-in for project definitions

Page 2: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Contents• Review of assignment 3 presentations

• Reading (re-organized)

• Information management

• Information workflow

• Information discovery

• Checking in for project definitions?

• Next class

2

Page 3: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Discussion – A3• What did you learn?

• What was easy/ hard?

• What did you learn from others?

• Will you ever look at an information system the same again?– Sorry?– Happy?

3

Page 4: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Logical Collections• The primary goal of a Management system is to

abstract the physical collection into logical collections. The resulting view is a uniform homogeneous collection.

– Identifying naming conventions and organization– Aligning cataloguing and naming to facilitate search,

access, use– Provision of contextual information

4

Page 5: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Physical Handling• This layer maps between the physical to the logical

views. Here you find items like replication, backup, caching, etc.

– Where and who does it come from?– How is it transferred into a physical form?– Backup, archiving, and caching…– Formats– More --- naming conventions

5

Page 6: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Interoperability Support• Normally the information does not reside in the

same place, or various collections (like catalogues) should be put together in the same logical collection.

– Bit/byte and platform/ wire neutral encodings – Programming or application interface access– Structure and vocabulary (metadata) conventions

and standards

6

Page 7: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Security• Access authorization and change verification. This

is the basis of trusting your information.– What mechanisms exist for securing?– Who performs this task?– Change and versioning (yes, the information may

change), who does this, how?– Who has access?– How are access methods controlled, audited?– Who and what – authentication and authorization?– Encryption and integrity

7

Page 8: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Ownership• Define who is responsible for quality and

meaning– Rights and policies – definition and

enforcement– Limitations on access and use– Requirements for acknowledgement and use– Who and how is quality defined and ensured?– Who may ownership migrate too?– How to address replication?– How to address revised/ derivative products?

8

Page 9: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Metadata• Metadata are data about data.

• Metainformation are information about information– How to know what conventions, standards, best

practices exist?– How to use them, what tools?– Understanding costs of incomplete and

inconsistent metadata– Understanding the line between metadata and

data and when it is blurred– Knowing where and how to manage metadata

and where to store it (and where not to)9

Page 10: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Persistence• Definition of lifetime. Deployment of mechanisms to

counteract technology obsolescence.– Where will you put your information so that

someone else (e.g. one of your class members) can access it?

– What happens after the class, the semester, after you graduate?

– What other factors are there to consider?

10

Page 11: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Discovery• Ability to identify useful relations and

information inside the collection– If you choose (see ownership and security), how

does someone find your information?– How would you provide discovery of collections,

versus files, versus ‘bits’?– How to enable the narrowest/ broadest

discovery?

11

Page 12: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Dissemination

12

• Mechanism to make aware the interested parties of changes and additions to the collections.

– Who should do this?– How and what needs to be put in place?– How to advertise?– How to inform about updates?– How to track use, significance?

Page 13: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Summary of Management• Creation of logical collections

• Physical handling

• Interoperability support

• Security support

• Ownership

• Metadata collection, management and access.

• Persistence

• Knowledge and information discovery

• Dissemination and publication 13

Page 14: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Information Workflow• What it is?

• Why you would use it?

• Some pointers to workflow systems

14

Page 15: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

15

What is a workflow?• General definition: series of tasks performed

to produce a final outcome

• Information workflow – “analysis pipeline”– Automate tedious jobs that users traditionally

performed by hand for each dataset– Process large volumes of data/ information faster

than one could do by hand

Page 16: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

16

Background: Business Workflows

• Example: planning a trip

• Need to perform a series of tasks: book a flight, reserve a hotel room, arrange for a rental car, etc.

• Each task may depend on outcome of previous task– Days you reserve the hotel depend on days of

the flight– If hotel has shuttle service, may not need to

rent a car

Page 17: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

17

What about information workflows?

• Perform a set of transformations/ operations on a data or information source

• Examples– Generating images from raw data– Identifying areas of interest from a large

information source– Classifying set of objects– Querying a web service for more information

on a set of objects– Many others…

Page 18: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

18

More on Workflows

• Formal models of the flow of data/ information among processing components

• May be simple and linear or more complex• Can process many data/ information types:

– Archives– Web pages– Streaming/ real time– Images (e.g., medical or satellite)– Simulation output– Observational data

Page 19: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

19

Challenges • Questions:

– What are some challenges for users in implementing workflows?

– What are some challenges to executing these workflows?

– What are limitations of writing a program?

• Mastering a programming language

• Visualizing workflow

• Sharing/exchanging workflow

• Formatting issues

• Locating datasets, services, or functions

Page 20: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

20

Workflow Management Systems

• Graphical interfaces for developing and executing scientific workflows

• Users can create workflows by dragging and dropping

• Automates low-level processing tasks

• Provides access to repositories, compute resources, workflow libraries

Page 21: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

21

Workflow Management Systems

Page 22: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

22

Benefits of Workflows

• Documentation of aspects of analysis

• Visual communication of analytical steps

• Ease of testing/debugging

• Reproducibility

• Reuse of part or all of workflow in a different project

Page 23: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

23

Additional Benefits

• Integration of multiple computing environments

• Automated access to distributed resources via other architectural components, e.g. web services and Grid technologies

• System functionality to assist with integration of heterogeneous components

Page 24: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Why not just use a script?• Script does not specify low-level task

scheduling and communication

• May be platform-dependent

• Can’t be easily reused

• May not have sufficient documentation to be adapted for another purpose

24

Page 25: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Why is a GUI useful?• No need to learn a programming language

• Visual representation of what workflow does

• Allows you to monitor workflow execution

• Enables user interaction

• Facilitates sharing of workflows

25

Page 26: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Some workflow systems• Kepler• SCIRun• Sciflo• Triana• Taverna• Pegasus• Some commercial tools:

– Windows Workflow Foundation– Mac OS X Automator

• http://www.isi.edu/~gil/AAAI08TutorialSlides/5-Survey.pdf • http://www.isi.edu/~gil/AAAI08TutorialSlides/ • See reading for this week

26

Page 27: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Recall forms of information• Structured/ un-structured

• Presentation and organization

• Syntax-semantics-pragmatics

• Managed, designed and architected.

• Goal of this part of the class is to understand how discovery is enabled or disabled based on these factors

27

Page 28: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Discovery• How does someone find your information?

• How would you provide discovery of – collections – files – ‘bits’

• How would you find ->

28

Page 29: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Discoveryo Federated Searcho Folksonomies (user contributed)o Intelligent Agentso Search Engineso Taxonomies

o Find photos of KimoBoy or girl?

29

Page 30: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Use cases• Find a sound recording of a swallow.

• Excuse me?

30

Page 31: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Use cases• Find a sound recording of an African Swallow

• Find a sound recording of a bird that sounds like an African Swallow

• Media types – how can you discover them?

31

Page 32: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Use cases• Find the movie that Jean Tripplehorn first

starred in/ that was her most successful/ was lead actress?

• Has anyone gene sequenced a mouse?

• Discovery can often involve information integration

32

Page 33: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

33

Three level ‘metadata’ solution for DATA

Level 1:

Data Registration at the Discovery Level,

e.g. Volcanolocation and activity

Level 2:

Data Registration at the Inventory Level,

e.g. list of datasets,times, products

Level 3:

Data Registration at the Item Detail

Level, e.g. access toindividual quantities

Ontology basedData IntegrationUsing scientific

workflows

Earth Sciences Virtual DatabaseA Data Warehouse where

Schema heterogeneity problem is Solved; schema based integration

Data Discovery Data Integration

A.K.Sinha, Virginia Tech, 2006

Page 34: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

34

Three level ‘metadata’ solution?

Level 1:

Registration at the Discovery Level,

e.g. Find the upperlevel entry point to a

source

Level 2:

Registration at the Inventory Level,

e.g. list of datasets,using the logical

organization

Level 3:

Registration at the Item Detail

Level, i.e. annotatione.g. tagging

Integrationusing mappingmanagement

Catalog/ IndexSchema based integration

Information Discovery

Information

Integration

A.K.Sinha, Virginia Tech, 2006

Page 35: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Information discovery• What makes discovery work?

– Metadata– Logical organization– Attention to the fact that someone would want to

discover it– It turns out that file types are a key enabler or

inhibitor to discovery

• What does not work?– Result ranking using *any* conventional

algorithm35

Page 36: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Federated search• “is the simultaneous search of multiple online

databases or web resources and is an emerging feature of automated, web-based library and information retrieval systems. It is also often referred to as a portal or a federated search engine.” wikipedia

• Libraries have been doing this for a long time (Z39.50, ISO23950)

• Key is consistent search metadata fields (keywords)• E.g. Geospatial One Stop http://www.geodata.gov

36

Page 37: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Search engines (1)• Contains an automated spider or crawler • No theoretical limits in the amount of indexing

(limited by hardware) • Support remote indexing• Continual background indexing of content• Custom metatag support (some low-end

products do not support this feature) • Support for indexing PDF, .doc, etc (some low-

end products do not support this feature) • Supports URL and word exclusions &

inclusions37

Page 38: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Search engines (2)• Server-Side Includes (SSI) supported

• Search by custom metatags

• Case sensitive or insensitive searching

• Simple Customizable search/results pages

• Boolean Searching capabilities

• Provide users meta description and page title in search results

• Inexpensive – ~$200 (2010)

• Easily customizable search/results interface38

Page 39: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Search engines (3)• Result weighting feature

• URL Inclusion list

• Require significant memory (RAM) and disk space as the collection grows

• Low-end alternatives often do not possess the capabilities to do phrase or natural language searching.

39

Page 40: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Improve www discovery?• Implement metatags on your and your partners web

sites• Update content frequently • Register your site with the major search engines

(tools exist to aid in this process)• Perform a basic study of where your site results

within the major search engine providers• Do not spam the search engine providers • Re-evaluate your web site directory structure to

ensure information is appropriately categorized/ described within your URL strings

40

Page 41: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Improve www discovery• Look through your server log files to determine what

users are trying to find on your site and/or the path they are using to find information

• Perform basic usability testing of your site to determine what users expect and can easily gather from your site. This also may determine why users go to an Internet search engine provider versus accessing your site directly.

• Realize that Internet search engines don’t all act the same, index at the same time period, and often value a particular metatag, document date, etc. more than another vendor product. 41

Page 42: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Smart search• Semantically aware search, e.g.

http://noesis.itsc.uah.edu , http://eie.cos.gmu.edu (Water -> Semantic Search)

• Faceted search, e.g. mspace (http://mspace.fm ), Earth System Grid (http://esg.prototype.ucar.edu ), exhibit (MIT)

42

Page 43: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

NOESIS

43

Page 44: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Faceted search• Semantically aware search, e.g.

http://noesis.itsc.uah.edu

• Faceted search, e.g. mspace (http://mspace.fm ), Earth System Grid (http://esg.prototype.ucar.edu )

44

Page 45: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Summary - discovery• Useful to write a few discovery use cases to

drive how your design is developed

• Evolution of your role in facilitating discovery and what/ how others implement access to your information

45

Page 46: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Reading for this week• Is retrospective

46

Page 47: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Check in for Project Assignment

• Analysis of existing information system content and architecture, critique, redesign and prototype redeployment

• Or a new use case, development, etc.

47

Page 48: 1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

What is next• No class next week April 12 – GM week

• Week 10 – Information integration, life-cycle and visualization

48