Top Banner
Biodiversity Biodiversity Data vs. the Data vs. the Web 2.0 Web 2.0 OR How I learned to stop worrying and love the “systems” Ana Dal Molin J. B. Woolley Texas A&M University
34

Da molin databases_ecn_2012

Nov 22, 2014

Download

Technology

ECNOfficer

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Da molin databases_ecn_2012

Biodiversity Biodiversity Data vs. the Data vs. the

Web 2.0Web 2.0OR

How I learned to stop worrying and love the “systems”

Ana Dal MolinJ. B. Woolley

Texas A&M University

Page 2: Da molin databases_ecn_2012

Source: Opte.orgJan 2005

Page 3: Da molin databases_ecn_2012

[ Why this talk ]

Page 4: Da molin databases_ecn_2012

• Data providers• Aggregators• Tools • etc

“growth in bioinformatics data exceeded Moore’s Law, the well-known observation that the number of transistors on a chip doubles every 18 months.” (Butte, 2001, TRENDS in Biotechnology 19(5))

• Johnson, N. 2007. Annual Rev. Entomology• http://www.ala.org.au/about-the-atlas/downloadable-tools/tools-review/• IDigBio

47*

Page 5: Da molin databases_ecn_2012

[ what do I use? ]

Page 6: Da molin databases_ecn_2012

• Museums often have already decided on a model/database system

• Each researcher, on the other hand, may not have, so questions– Content management systems (CMS)?– Which output?– Stability? – Best practices?

Page 7: Da molin databases_ecn_2012

‘systems’ available• First Generation: desktop-based (MS Access,

FileMaker)• Second Generation: desktop-based with web output• Third Generation: content management systems

(PHP, Ruby, MySql, etc.)

Page 8: Da molin databases_ecn_2012

Data Accessibility

Page 9: Da molin databases_ecn_2012

Your data on the ‘net

• Reach• Model

GBIF species distribution data coverage (2010)

Page 10: Da molin databases_ecn_2012

[ ? ]

Metadata

Data

Metadata repository Name IndexOccurrence Index

Yellow PagesRegional Atlas

Annotation Tools

Biosecurity Portal

Analysis Tools Products

LaSalle, 2008. Atlas of Living Australia, ICE2008 presentation

Page 11: Da molin databases_ecn_2012

[ where do I stand? ]

Page 12: Da molin databases_ecn_2012

• Taxonomy as 2-natured science• Shifts in media format

Page 13: Da molin databases_ecn_2012

Web 1.0 -> Web 3.0 1.0: Static HTML, e-mail, forums, chat 2.0: Dynamic HTML, Wikis, blogging,

commenting, social networking 3.0: …

*You and your work are not invisible before publication*

Page 14: Da molin databases_ecn_2012

• Web 3.0:– “Social”– Tags – Cloud computing– Ubiquitous connectivity – Open technologies, open data formats (and open identity

too)– Publishing in languages specifically designed for data

(databases, markup)– Semantic web– Marketing

Page 15: Da molin databases_ecn_2012

http://www.tdwg.org

Page 16: Da molin databases_ecn_2012

• What the user wants • What you have to deal with

*

*not done!

Page 17: Da molin databases_ecn_2012

Think it through

Page 18: Da molin databases_ecn_2012

Books Gutenberg Gutenberg Project WordCat Hashi Trust

Page 19: Da molin databases_ecn_2012

The way we collect information is differentThe way we accumulate information is differentThe way we understand information is different

Page 20: Da molin databases_ecn_2012

… or not

Jan/201233%USA, 20% Brazil, 26% Europe (Germany, Sweden, Spain, Greece, UK)

Page 21: Da molin databases_ecn_2012
Page 22: Da molin databases_ecn_2012

1.0 2.0

Page 23: Da molin databases_ecn_2012

• Web 3.01. People lie2. People are lazy3. People are stupid4. Mission: impossible – know

thyself5. Schemas aren’t neutral6. Metrics influence results7. There’s more than one way to

describe something

C. Doctorow, Metacrap, 2001

Page 24: Da molin databases_ecn_2012

Issues • “Unification”* is not going to happen – curators and

researchers will always have their own – (although often largely overlapping) set of crucial

information fields which can be cross-linked• These days, it is imperative that databases

communicate with each other• ‘unitary taxonomy’ is also not possible and any big

database needs to allow the system to display conflicting ideas

* Thomas, C. “Biodiversity databases spread, prompting unification call”, Science v. 325 (2009)

** http://hymao.org

Page 25: Da molin databases_ecn_2012

Data ephemerality

• Local vs. Web data

?!

Source: Wikipedia, “Science 2.0”

Page 26: Da molin databases_ecn_2012

Data ephemerality• Digital data preservation: Internet Archive, IIPC• Library of Congress discussions and recommendations

– Disclosure, Adoption, Transparency , External dependency, Technical protection

• http://www.digitalpreservation.gov/formats

Page 27: Da molin databases_ecn_2012
Page 28: Da molin databases_ecn_2012
Page 29: Da molin databases_ecn_2012

User perspective “Incomplete” sites Dynamic information

Selective information?

Page 30: Da molin databases_ecn_2012

Why I am not a luddite:

Page 31: Da molin databases_ecn_2012

Online databases are taxonomic product and marketing for your work

Online biodiversity databases complement your work

But it’s up to you to be able to make the user understand that your work is more than that

The user of online databases is probably not the same as the person who will get your paper

Page 32: Da molin databases_ecn_2012

summing up• Choose the system based on reports you want/need to

deliver

Page 33: Da molin databases_ecn_2012

… or work with a journal/team that can help you• Make sure the system is flexible enough in your hands• Decide who will do the maintenance of your data

– How big is your team?– Fluidity (positive and negative)

• Think about stability and backup strategies

Page 34: Da molin databases_ecn_2012

Thanks!!