THE USA IM LTER CASE: A different story Melendez, 2011
Jan 05, 2016
THE USA IM LTER CASE: A different story
Melendez, 2011
OVERVIEW
The role of Information Management in the evolution of Informatics: two perspectives About Informatics and Information
Management Concepts Two sources for one story: goals, issues
solutions your book an LTER information management paper
LUQ LTER CaseMelendez, 2011
TWO WORDS ONE CONCEPT?
INFORMATICS (pp 14 in Reddy 2009): “discipline of science which
investigates the structure and property (not specific content) of scientific information as well as the regularities of scientific information activity, its theory, history, methodology and organization” (1967) or the
“interdisciplinary study of the design, application, use and impact of information technology” (2008 on) (pp 16)
INFORMATION MANAGEMENT (Wikipedia) is the collection and
management of information from one or more sources and
the distribution of that information to one or more audiences.
This sometimes involves those who have a stake in, or a right to that information.
Management means the organization of and control over the structure, processing and delivery of information.
Observe that while one studies the
information the other includes the
information per se, and
that while one studies the design of the information the other one includes the design of the
structure
Melendez, 2011
IM/ITIT/IM
Earth Science
Earth Science
Computer Science
Computer Science
Figure 5b. Computer science perspective. A computer science point of view where information management is considered closer to domain science.
Figure 5a. Domain Science Perspective. An earth science point of view where information technology is considered close to information system and computer science.
Ambiguity in Understanding Roles: IT and IM
Baker and Millerand, 2007
What is Informatics?
DomainSciences
SocialSciences
InformationSciences
Informatics is an applied science, an interdisciplinary field of study at the
intersection of social sciences, information sciences, and domain
science.
Baker and Millerand, 2007
ECOINFORMATICS: TOOLS AND TECHNIQUES BY R A TEDDY
On ILTER: …”has the unique ability to design collaborative, site
base projects, compare data from a global network of sites and detect global trends. ILTER members also have the expertise in the collection, management and analysis of long-term environmental data” pp 31
On KNB (Knowledge Network for Bio-Complexity) “We have conceived of the KNB as a mechanism for
scientists to discover, access, interpret, analyze, and synthesize the wealth of data that is collected by ecological and environmental scientists nationally (and eventually internationally) pp 46 - 47
Melendez, 2011
ENRICHING THE NOTION OF DATA CURATION IN E-SCIENCE: DATA MANAGING AND INFORMATION INFRASTRUCTURING IN THE LONG TERM ECOLOGICAL
On data curation related to data sharing by drawing on an ethnographic study of one of
the longest-running efforts at long-term consistent data collection with open data sharing in an environment of interdisciplinary collaboration.
On the continuous and historical role of the LTER information managers through data care work and information
infrastructure development.
http://interoperability.ucsd.edu/publications/Melendez, 2011
Data Curation* in e-science or cyberifrastructure
• data collection with open data sharing in an environment of interdisciplinary collaboration
• large-scale science carried out through distributed global collaborations
• parallel to providing a substrate for the successful access, sharing and (re)use of data collections
• archive and preserve exponentially increasing volumes of primary data for contemporary discovery and future re-use
The Goal or drives: (from Helena Karasti et al., 2006)
From The International Journal of Digital Curation Issue 2, Volume 4 ,2009: data curation is defined as a set of repeated and repeatable activities focusing on tending data and creating data products within a particular arena. “ways of organizing, displaying, and repurposing preserved data.”
Melendez, 2011
Study of inherent structure of ecological information
• management and analysis of ecological information
• facilitate and expedite large scale ecological research
• define entities and natural processes with language common to both humans and computers
• aims to facilitate environmental research and management
The Goals or drives: (as defined in the Reddy, 2009)
Melendez, 2011
THE CHALLENGE: (FROM HELENA KARASTI ET AL., 2006)
Table III. The extended temporal horizon of ongoing data managing in LTER (pp 332)
Recovering legacy datasets
Attending to ongoing data collection
Designing for the future
‘‘I was trying to document a lot of historic stuff because the PI [principal investigator] was coming on with Alzheimer’s and I knew that he was going to retire. I had a series of interviews with him and I got incredible documentation for these early corporate data.’’ (IM)
‘‘Getting scientists’ data into our system from the very beginning...whether it is to help them with data entry forms, setting up data entry programs, all the way from QA/QC programs to getting it archived into our system and accessible on the Internet.’’ (IM)
‘‘We envision also that we’ll also be adding the EML [Ecological Metadata Language]... and sort ofoften go back and forth between whether we want to do that from the ASCII files or the database...but at any rate we’ll somehow make EML available dynamically on theInternet to the group atlarge.’’ (IM)
Historical/legacy Immediate/near term
Long-term
THE IMPEDIMENTS: (AS DEFINED IN THE REDDY, 2009)
The infrastructure for this network must deal with major impediments to synthesizing data on ecology and the environment:
Data is widely dispersed Data is heterogeneous, and Synthetic analysis tools are needed
Melendez, 2011
Study of inherent structure of ecological information• to create and apply computer technology
• developing computer databases and algorithms
• integrates environmental and information
• developing ways to access, integrate databases of environmental information, and develop new algorithms enabling different datasets to be combined to test ecological hypotheses
The solution: (as defined in the Reddy, 2009)
Melendez, 2011
a cooperative, federated database system approach to organizing information
management in LTER (Baker et al., 2000)SITE LEVEL NETWORK LEVEL
• ongoing, retrospective–prospective data management,• intensive data contextualization and description,• judicious technology design
• collaborative information infrastructure and metadata standards work.
The Solution: the site – network model
Melendez, 2011
The new ERA: decade of synthesis and the accumulation of more than 20 years of data forces the scientific community to see IM as their necessary tool:
From a mandate (from NSF)a need for data depository
a need for data depository a need for data synthesis
Equivalent to the trajectory of data accumulation and growth and the software tools to manage them
Proprietary software (MS) from Excel access
Open sourceaccess mysql
The Solution: the site – network model
Melendez, 2011
LONG TERM ECOLOGICAL RESEARCH NETWORK
US LTER since 1980A social network:• 2100 participants • 26 site biomes• Network Office
A technological network:• 26 25 information managers• Loose network supporting
local site data repositories• Sites work in collaboration on
Network Information System• Instrumenting the ecosystem
site network
LTER: http://lternet.edu
LUQ LTER CASE
4 LTER proposals since 1988
Information Management began in 1989
Evolution and Development of an Information Management system the Web site as a window to the site’s Information
Management System (IMS) the website as the IMS framework
Close working collaboration with the LTER Network Office (LNO)
Close working collaboration with the information managers: conceptual framework
Melendez, 2011
EVENTS IN THE LTER
Data Archiving Data Integration
1989 on 1995 on 2001 on
Need to team up
and publish metadata
Organizing, cataloguing; Develop LUQ documentation
standards and protocols
Needto
document
Need to have
searchable data in the
web site
Documenting and publishing the data
on our first web
Decadal Plan and the adoption of
the EML standard
HISTORY OF LUQ LTER
Melendez, 2009
HISTORYLUQ LTER ONLINE DATA SETS
19921993199720002002200620082009
0 15 30 3574
104 117 127
0
75
4
On the Web Not Online
Melendez, 2009
LUQ LTER PROTOCOLS
At the site level Data gathering Data entry Data quality control Data sharing
At the network level EML: Ecological Metadata Language Specialized network databases:
climdb/hydrodb, GIS (under development)
Melendez, 2011
PROGRESSLUQ EML METADATA PACKAGES DEVELOPED
2005 2007 2009
9875
46
021
65
EML L 3-4 EML L5 QC
Melendez, 2011
Dataset Design -PI
-Information ManagerData Collection Data Entry
Metadata Preparation
Quality Control and Assurance
Review
Data Publication on WWW
Revision
Revision
DATA FILING PROTOCOL*
HISTORY
Melendez, 2009* Diagram graphics was designed by J. Porter in 2006
THE LUQ WEB SITE
Data
Local Use
Knowledge production
CommunityReuse
Data Production
Baker and Chandler, DSR, 2008
Modes of Knowledge Production
Data DeliveryNew PracticesCollaborative ResearchInterdisciplinaryData Exchange
Contemporary: Mode 2
PublicationsReportsIndividual ResearchDisciplinary
Traditional: Mode 1
OU
R V
ISIO
N
LUQ NEW IMS FRAMEWORK
THE LUQ NETWORK CONNECTION
US Network: EML, Metacat, (data harvestin
Outreach: Schoolyard ILTER: China, Malaysia (The Kepler
Example) Other Networks: ULTRA, CTFS
Data managerInformation managerInformaticist (ie physicist, geneticist)Informatician (ie statistician)Informologist (ie biologist)Informateer (ie mouseketeer)Informatics specialist (ie on-the-job training)Data librarian (ie information & library sciences)Data scientist (ie data & domain expertise)Data curator (ie with data repository)
What is the name for those who work with data?
Information ProfessionalsBaker and Chandler, DSR, 2008