Science 2.0 VU - KTI – Knowledge Technologies Institutekti.tugraz.at/staff/elex/courses/science20/slides/open_science_and... · Events Poceedings of the 13th International Semantic
Post on 19-Feb-2019
218 Views
Preview:
Transcript
www.tugraz.at n
W I S S E N n T E C H N I K n L E I D E N S C H A F T
u www.tugraz.at
Science 2.0 VU Science 2.0, Open Science, Open Data and Open Access
WS 2014/15
Elisabeth Lex KTI, TU Graz
www.tugraz.at n
Agenda
• Science 2.0 (repetion from last time) • Open Science
• Open Data • Open Access
+ Practical examples
2
www.tugraz.at n
Lecture Schedule
• Small Change: 20.11.2014: Scientometrics and Altmetrics (Lecturer:
Dr. Peter Kraker) 27.11.2014: E-Science, E-Infrastructures, Content
Mining
3
www.tugraz.at n
Repetition: Change in science (1/4) (from last lecture)
• Significant increase of scientific production • More output, less formal output
• Scientific process is opened up: • Large-scale, remote collaboration of scientists with
the use of Web 2.0/Internet tools similar to open source software collaboration
• Access to scientific information free of charge to the end-user
• New fields of research: • E.g. alternative reputation systems (“altmetrics)
4
www.tugraz.at n
Repetition: Change in science (2/4)
à faster knowledge exchange, prevention of unnecessarily repeated experiments, more discussion
http://book.openingscience.org/basics_background/towards_another_scientific_revolution.html
à Significant increase of scientific production
www.tugraz.at n
6
Open Science
Scientific Process
Interconnected trends within Science 2.0
Repetition: Change in science (4/4) (rep. from last lecture)
Source: Background document, Public Consultation Science 2.0: Science in Transition http://ec.europa.eu/research/consultations/science-2.0/background.pdf
www.tugraz.at n
Open Science
7
www.tugraz.at n
Traditional Research/Publication Process
• Doing research, describe final results in paper • Paper submitting to Conference / Journal / Book
• Established publishers: IEEE, ACM, Elsevier, Springer • Top Journals: Nature, Science, .. • Top Conferences: WWW, SIGIR, WSDM, CIKM,...
• Paper gets reviewed – peer review (single blinded, double blinded) à accepted or rejected
• If accepted and paper will be published • Often, copyright is transfered to publisher • Authors/universities need to pay to access paper
8
www.tugraz.at n
Open Research/Publication Process
• Doing research, documenting interim results online • Collaboration with others, raising questions, etc.
• E.g. Source code in Open Source repository • E.g. bitbucket, github,..
• E.g. publish data in data repository (e.g. figshare) • Publish pre-final version in preprint archive: e.g.
arxiv.org • Ideally - publish final paper in Open Access journal
à Open Science
9
www.tugraz.at n
Open Science
10
© OKFN Open Science Group: http://science.okfn.org
“Open science means opening up the research process by making all of its outcomes, and the way in which these outcomes were achieved, publicly available on the World Wide Web” (Kraker et al. 2011)
www.tugraz.at n
11
Open Science: 3 Pillars
Movement to make scientific research, data and dissemination publicly accessible
New mode of scholarly communication!
Source: A Revolution in Open Science: Open Data and the Role of Libraries (Professor Geoffrey Boulton at LIBER 2013)
http://de.slideshare.net/libereurope/boulton-gsb-presentationlibermunich
www.tugraz.at n
Barriers to Open Science
• Lack of evidence of benefits and rewards.
12
What currently counts • Money • Grants • Papers • Teaching • Service
What currently doesn’t • Sharing data • Sharing software • Open access • Collaboration • Patents • Startups
Getting Ahead as a Computational Biologist in Academia PLOS Comp Biol
www.tugraz.at n
Barriers to Open Science
• Lack of skills, time and other resources • takes time to be open, social and to collaborate
• Cultures of independence and competition • Different work cultures, certain stages at academic career
(e.g. going for tenure) • Concerns about quality
• Everybody can put content out there • Ethical, legal and other restrictions on accessibility
of research output • Business models of publishers
Source: RIN/NESTA report, 2010
13
www.tugraz.at n
Benefits of Open Science
14
• Increasing the efficiency of research, • Enhancing visibility and scope for engagement • Enabling researchers to ask new research
questions • Enhancing collaboration and community-building
www.tugraz.at n
Example: Open Science Framework
15
https://osf.io
• Maintained by the Center for Open Science • Plattform for DOING Open Science
www.tugraz.at n
Open Notebook Science
• Day-to-day “lab notes” available in real time • Open notebook scientists are giving everyone direct
insight into their work, and enabling easier collaboration
• Data and methods made public at time of collection and research
• E.g. OpenWetWare (biology and biological engineering), Open Notebook Science Network (chemistry and other disciplines), or The IPython Notebook (interactive computational science)
16
www.tugraz.at n
Example: The Ipython Notebook
• Web-based interactive environment
• Code execution, text, mathematics, plots in one single document
• Can be shared with colleagues, converted to e.g. pdf
17
• http://ipython.org/notebook.html
www.tugraz.at n
Example: rOpenSci
• rOpenSci: consists of packages that enable to access data repositories through R
• collaborative approach to experimentation • also for doing research ABOUT Open Science
• R: statistical programming environment
• http://www.r-project.org • Free software environment
• Classification, clustering, statistical tests,... • Simple way to produce „publication ready“ plots
• Windows, unix, osx: CRAN mirror. 18
www.tugraz.at n
19
Open Science
Source: A Revolution in Open Science: Open Data and the Role of Libraries (Professor Geoffrey Boulton at LIBER 2013)
http://de.slideshare.net/libereurope/boulton-gsb-presentationlibermunich
www.tugraz.at n
“Open data is data that can be freely used, reused and redistributed by anyone - subject only, at most, to the
requirement to attribute and sharealike”
http://opendefinition.org
What is Open Data?
www.tugraz.at n
Tim Berners-Lee about Open Data (TED talk, 2010)
http://www.ted.com/talks/tim_berners_lee_the_year_open_data_went_worldwide?language=en
www.tugraz.at n
Importance of Open Data • Big and open data estimated to add 1.9% of EU-28
GDP by 2020 (EC, 2014) • New knowledge from combined data sources and
patterns in large data volumes • In general: limited access to scientific information
strong impact on competitiveness of SMEs!
• User enabled to make informed decisions • Users enabled to get insights into governmental and
global public data à transparency and democratic control
[1] Demos Europe/WISE (2014): Big & and open data in Europe. A growth engine or a missed opportunity? http://www.bigopendata.eu/wp-content/uploads/2014/01/bod_europe_2020_full_report_singlepage.pdf
22
(EC, 2014) http://ec.europa.eu/research/consultations/science-2.0/background.pdf
www.tugraz.at n
Example: Open Government Data: Eurostat
23
“I’d like to compare the unemployment rate in Austria with the European one”
Google Public Data Explorer, https://www.google.com/publicdata/directory
www.tugraz.at n
Example: Open Street Map
24
e.g. “Where is the KTI?” http://www.openstreetmap.org
www.tugraz.at n
Open Data in Science
• Focus on publishing observations and results of scientific activities available for anyone to analyse and reuse.
• Idea exists since 1950s à Internet has boosted this: low cost and time required to publish or obtain data.
“Openly accessible research data can typically be
accessed, mined, exploited, reproduced and disseminated, free of charge for the user.”
Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020, p3
25
www.tugraz.at n
Example: Open Data in Science
26
Scientific Breakthroughs
www.tugraz.at n
Another point for Open Data
“There is evidence that studies that make their data
available do indeed receive more citations than similar studies that do not.”
Piwowar H. and Vision T.J 2013 "Data reuse and the open data citation advantage“
27
More citations!
www.tugraz.at n
Characteristics of Open Data
• Availability and Access: data must be available at reasonable reproduction cost (basically, only downloading). Data must also be available in a convenient and modifiable form.
• Reuse and Redistribution: data must be provided under terms that permit reuse and redistribution including combination with other datasets.
• Universal Participation: everyone must be able to use, reuse and redistribute. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed.
Source: The Open Data Handbook, http://opendatahandbook.org/en/what-is-open-data/
www.tugraz.at n
Interoperability – Key for Open Data
29
Structured: machine-readable
logical format
Addressable: URIs
shareable
Licensed/Traceable full provenance
Continuous: maintained accessible
à Ideal situation
www.tugraz.at n
5 Star Deployment Scheme for Open Data by Tim Berners-Lee
30 http://www.w3.org/DesignIssues/LinkedData.html
Available on the web (whatever format) but with an open licence, to be Open Data
Available as machine-readable structured data
as (2) plus non-proprietary format
All plus: Use open standards from W3C (RDF and SPARQL) to identify things
All, plus: Link your data to other people’s data to provide context
Linked Data
www.tugraz.at n
Linked Data • Enables to create typed links between data from
different sources (Bizer et al., 2009)
31
"Linking Open Data cloud diagram 2014, by Max Schmachtenberg, Christian Bizer, Anja Jentzsch and Richard Cyganiak. http://lod-cloud.net/"
www.tugraz.at n
Principles of Linked Data 1. Use URIs as names for things
(Uniform Resource Identifier): string of characters to identify a name or a resource on the Internet.
2. Use HTTP URIs so people can look up these names 3. When someone looks up a URI, provide useful
(RDF) information 4. Include RDF statements that link to other URIs so
that they can discover related things Tim Berners-Lee 2007 http://www.w3.org/DesignIssues/LinkedData.html
à Linked Data allows sharing and use of data, ontologies and various metadata standards
32
www.tugraz.at n
Excurse: Resource Description Framework (RDF) • Data model to describe “things” and their interrelations
• subject-predicate-object expressions (triples) • Data items identified by URIs
• Can be queried by SPARQL
For example: <Edith> <was born in> <Graz> <Graz> <is part of> <Austria>
<Michael> <likes> <Iron Maiden>
www.tugraz.at n
Example: Linked Open Science
• LinkedScience.org: scientific data as Linked Data • Showcase: Deforestation and its related phenomena
• Linked Brazilian Amazon Rainforest Data, enables to explore spatial and temporal data
34
Suvodeep Mazumdar and Tomi Kauppinen. Visualizing and Animating Large-scale Spatiotemporal Data with ELBAR Explorer. Satellite Events Poceedings of the 13th International Semantic Web Conference (ISWC2014), Riva del Garda, Trentino, Italy, October, 2014
www.tugraz.at n
Peer Review vs. Open Peer Review
• Traditional review: editor (researcher, journal staff member in the case of journals like Nature) sends paper to experts in the field, who provide comments for the paper
• Single blind: reviewers can see who the authors are
• Double blind: reviewers don‘t see who authors are, only editor knows everyone’s identity
35
http://blog.f1000research.com/2014/05/21/what-is-open-peer-review/
www.tugraz.at n
Peer Review vs. Open Peer Review
• Open peer review: everybody sees everybody‘s names
• Sometimes: involves publicly naming reviewers and/or editors.
• Some journals publish some or all reviewer comments
36
www.tugraz.at n
Benefits of Open Peer Review
• Benefits for authors and readers • Author can see who reviewed their work • Reduces bias among reviewers • More constructive reviews • Published reports can serve as peer review examples
for young researchers. • Benefits for reviewers
• Shows the reviewer’s informed opinion of the work • Demonstrates experience as a reviewer • Can take credit for the work involved in conducting the
review
37
Source: http://blog.f1000research.com/2014/05/21/what-is-open-peer-review/
www.tugraz.at n
38
Open Science
Source: A Revolution in Open Science: Open Data and the Role of Libraries (Professor Geoffrey Boulton at LIBER 2013)
http://de.slideshare.net/libereurope/boulton-gsb-presentationlibermunich
www.tugraz.at n
Challenges
• Current system of scientific publishing works against maximum dissemination of scientific data underlying publications
• inability to access data, • restrictions on usage applied by publishers or data
providers, • publication of data that is difficult to reuse • cultural reluctance to publish data openly, for multiple
reasons
39
www.tugraz.at n
Open Access
40
“Open Access stands for unrestricted access and
unrestricted reuse.”
Definition by Public Library of Science (PLoS) https://www.plos.org/about/open-access/
www.tugraz.at n
Open Access (OA): Motivations
• Idea: Research funded by the public should be available to the public (ethical)
• OA for a variety of research outputs: e.g. Journals, books, papers, datasets
• OA publications will have more accesses (readers), citations and therefore impact (Research Impact)
• Concern over the hindrance to research caused by the cost of journal subscriptions (cost)
Source: Jeffery, K. Open Access: An Introduction, 2006. http://www.ercim.eu/publication/Ercim_News/enw64/jeffery.html
www.tugraz.at n
Open access
• Began with the community • Driven by organizations (PLOS, F1000, Mendeley etc.)
and policies and funders, first not by academic institutions • Development of OA licences:
• E.g. Creative Commons Attribution (CC BY) license: developed to facilitate open access: à free immediate access to, and unrestricted reuse of, original works of all types. Under this license, authors agree to make articles legally available for reuse, without permission or fees, for virtually any purpose. Anyone may copy, distribute or reuse these articles, as long as the author and original source are properly cited.
42
www.tugraz.at n
Open Access in Science: Open Access Journals ● Green („self-archiving): author can self-archive at the time of
submission of the publication whether the publication is grey literature (usually internal non-peer-reviewed), a peer-reviewed journal publication, a peer-reviewed conference proceedings paper or a monograph
● Gold („author pays“): the author or author institution can pay a fee to the publisher at publication time, the publisher then makes the publication available 'free' at the point of access .
● further little-used “road” hybrid forms: for example platinum open access (does not charge author fees)...
● Both green and gold are compatible and can co-exist
Source: Jeffery, K. Open Access: An Introduction, 2006. http://www.ercim.eu/publication/Ercim_News/enw64/jeffery.html
www.tugraz.at n
Example: arXiv.org (1/2) • Open Access electronic archive and
distribution server for research articles, founded in 1991
• Maintained by Cornell University Library • Areas: physics, mathematics, computer
science, nonlinear sciences, quantitative biology, statistics à 127 categories
• Users can retrieve papers from arXiv via web
• Registered users can submit their papers to arXiv
• Can be updated, previous versions remain available
• Highly popular: number of scientific papers posted to Arxiv.org steadily grows, number of readers as well
www.tugraz.at n
Example: arXiv.org (2/2)
• approx. 1 Mio publications
• No peer-review • Submission:
validated either through endorsement by established author or if from trusted IP
www.tugraz.at n
Example: Analyzing arXiv with rOpenSci
• rOpenSci: has an interface to the arXiv API, package aRxiv (install via CRAN) https://github.com/ropensci/aRxiv, http://ropensci.org/tutorials/arxiv_tutorial.html
• E.g. “How many articles for a discipline in arXiv?” kcmac-elex:~ elex$ r!> library(aRxiv)!> arxiv_cats!> arxiv_count('cat:"cs.CY"') # count papers in category ![1] 1764!> arxiv_count('cat:"quant-ph"')![1] 63257!
46
Karthik Ram and Karl Broman (2014). aRxiv: Interface to the arXiv API. R package version 0.5.2. https://github.com/ropensci/aRxiv
www.tugraz.at n
Example: PLOS one
• OA journal (CC License)
• Peer-reviewed • Online, maintained
by Public Library of Science (PLOS)
• Post publication tools to measure quality and impact
www.tugraz.at n
Example: rplos
• Interface to PLoS Journals search API
• API key needed • http://ropensci.org/
tutorials/rplos_tutorial.html
• E.g. plot results through time for search results from PLoS Journals
plot_throughtime(list("science 2.0"), 500)! Scott Chamberlain, Carl Boettiger and Karthik Ram (2014). rplos: Interface to PLoS
Journals search API.. R package version 0.4.1. https://github.com/ropensci/rplos
www.tugraz.at n
Benefits of Open Access
• Accelerated discovery: researchers can read and build on the findings of others without restriction
• Public enrichment: Scientific research often paid for
with public funding. Taxpayers enabled to see results • Improved education: Teachers and students have
access to the latest research findings
http://www.plos.org/open-access/
www.tugraz.at n
Summary
• Open Science • Ideas, Concepts, Benefits and Pitfalls
• E.g. Enhancing collaboration and community-building, increasing efficiency of research vs no reward system yet
• Open Data • Sharing your data influences how often you get
cited (Piwowar, et al., 2007 and Pinowar, et a., 2013)
• Different models for Open Access • Green vs. Gold vs. Hybrid
www.tugraz.at n
Questions?
51
top related