Preparing eScience Librarians for Managing Research Data RDAP 2012, New Orleans, LA Jian Qin School of InformaCon Studies Syracuse University
Preparing eScience Librarians for Managing Research Data
RDAP 2012, New Orleans, LA
Jian Qin School of InformaCon Studies
Syracuse University
NoCons of eScience librarianship
Part of team transcending disciplinary boundaries
AcCve players and
contributors of data curaCon
ConsultaCve services for data use and management
Leader in eScience iniCaCves
ProacCve training for data literacy
RDAP 2012, New Orleans 2
• ScienCfic data literacy (SDL) project (hNp://sdl.syr.edu), 2007-‐2009
• E-‐Science Librarianship Curriculum project (eSLib hNp://eslib.ischool.syr.edu), 2009-‐2012, in partnership with Cornell University Library
EducaCng the new type of workforce
RDAP 2012, New Orleans 3
A curriculum for eScience librarianship • Overall learning objecCves: – Ability to arCculate eScience and to plan and develop eScience librarianship projects
– Competency in scienCfic data management
– Competency in cyberinfrastructure technologies
– Ability to collaborate, communicate, and lead in eScience librarianship projects
RDAP 2012, New Orleans 4
Ability to arCculate eScience and to plan and develop eScience librarianship projects
• ArCculate eScience process and data lifecycle
• IdenCfy user needs and translate the needs into system requirements
• Make plans for eScience librarianship project iniCaCon and implementaCon
• Conduct research on data related issues such as insCtuConal data policy, support services, and technology adopCon
• Write grant proposals for obtaining funding to support eScience librarianship projects
RDAP 2012, New Orleans 5
Competency in scien-fic data management
• ArCculate data characterisCcs
• Analyze domain data sets and develop data models
• Define metadata element sets
• Develop specialized metadata for data curaCon, preservaCon, and access
• Create metadata records for scienCfic data sets
RDAP 2012, New Orleans 6
Competency in cyberinfrastructure technologies
• Maintain informaCon retrieval interfaces
• Maintain informaCon exchange networks
• Program, write code, and manipulate scripts
• Use content management systems
• IdenCfy and model data/work flows
• Assess research needs for and performance of CI tools
RDAP 2012, New Orleans 7
Ability to collaborate, communicate, and lead in eScience librarianship projects
• Develop partnership with internal and external organizaConal units and collaborators
• Communicate with administrators and researchers
• Engage researchers in data management processes
• IniCate and lead in eScience librarianship projects
RDAP 2012, New Orleans 8
The curriculum
ScienCfic Data Management (core)
Cyberinfrastructure and ScienCfic CollaboraCon (core)
Data services (capstone)
Database systems (required elecCve)
Metadata (required elecCve)
ScienCfic Data Management (core)
Cyberinfrastructure and ScienCfic CollaboraCon (core)
Data services (capstone)
Database systems (required elecCve)
Metadata (required elecCve)
ScienCfic Data Management (core)
Cyberinfrastructure and ScienCfic CollaboraCon (core)
Data services (capstone)
Database systems (required elecCve)
Metadata (required elecCve)
ScienCfic Data Management (core)
Cyberinfrastructure (core)
Data services (capstone)
Database systems (required elecCve)
Metadata (elecCve)
Competency in scienCfic data management
Courses Primary learning outcomes
Competency in cyberinfrastructure technologies
Ability to arCculate eScience and to plan and develop eScience librarianship projects
Ability to collaborate, communicate, and lead
in eScience librarianship projects
RDAP 2012, New Orleans 9
Theme 1: building fundamentals
Overview of scienCfic data management that covers
data and metadata fundamentals
1 Case studies that use pracCcal examples to guide students step-‐by-‐step in
data analysis and management
2
Using scienCfic data, which involves discussions of data quality, data repositories and discovery, data analysis and presentaCon, and ethics and intellectual property
issues
3
RDAP 2012, New Orleans 10
Building fundamentals: data formats
RDAP 2012, New Orleans 11
Overview of scienti.ic data management that
covers data and metadata fundamentals
Data level
NASA’s de-inition of data processing levels
Level 0
Reconstructed unprocessed instrument data at full resolutions.
Level 1A
Reconstructed, unprocessed instrument data at full resolution, time referenced, and annotated with ancillary information, but not applied to the Level 0 data.
Level 1B
Level 1A data that has been processed to sensor units. Not all instruments will have a Level 1B equivalent.
Processing level Level 4 Level 3 Level 2 Level 1B Level 1A Level 0
Major scienCfic data format
Self-‐descripCve informaCon existed as header of the data file
Common Data Format (CDF) Flexible Image Transport System (FITS) GRid In Binary (GRIB) Hierarchical Data Format (HDF) Network Common Data Format (netCDF)
Building fundamentals: Understanding data and metadata
RDAP 2012, New Orleans 12
Processing levels
Data formats
Data collecCons
Lineage vital to assessing data
quality
Some formats contain self-‐descripCve metadata
Metadata standards need to be adjusted for local
descripCon needs
Building fundamentals: data literacy
RDAP 2012, New Orleans 13
IL: ACRL. (2010). DL: Finn, Charles, W.P. (Tech & Learning, 2004) SDL: Qin, J. & J. D’Ignazio, (Journal of Library Metadata, 2010)
Theme 2: Analysis and generalizaCon
RDAP 2012, New Orleans 14
Analysis of data problems is an analysis of domain data, requirements, and workflows that will lead to the development of soluCons.
Analysis and generalizaCon: engaging in real research projects
• Engage students in research and service projects – Data policy analysis – Data management consultaCon – Interviews and survey design
• Course projects – Real-‐world data management problems
RDAP 2012, New Orleans 15
Theme 3: collaboraCon and communicaCon
• Community of pracCce • InsCtuConalizaCon of data services – Data policies – Compliance to funding agency policies and mandates
– Infrastructural data services at insCtuConal, community, and naConal levels
• Awareness, incenCves, and training
RDAP 2012, New Orleans 16
CollaboraCon and communicaCon
• Mentoring by Cornell librarians, led by Gail Steinhart
• Internships in academic libraries and/or research centers
• Guest speakers to classes • Engaging students in research and service projects
RDAP 2012, New Orleans 17
Evolving curriculum
RDAP 2012, New Orleans 18
Required courses: • Database • Applied Data Science
CAS in Data Science
Data storage and
management
Data analyCcs
Data visualizaCon
Systems management
RDAP 2012, New Orleans 19
eScience Librarianship Project Website:
hNp://eslib.ischool.syr.edu/