Top Banner
FIND AND UNDERSTAND DATA October, 2012 Hjalmar Gislason, founder & CEO - [email protected] Best Practices for Publishing Data
41

Strata NY: Best Practices for Publishing Data

Jan 14, 2015

Download

Business

A presentation by Hjalmar Gislason, founder and CEO of DataMarket at the Strata Conference in New York, October 2012
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Strata NY: Best Practices for Publishing Data

F I N D A N D U N D E R S TA N D D ATA

October, 2012Hjalmar Gislason, founder & CEO - [email protected]

Best Practices for

Publishing Data

Page 2: Strata NY: Best Practices for Publishing Data

Founder and CEO

HjalmarGislason

Twitter: @datamarketSlides: http://blog.datamarket.com/

Page 4: Strata NY: Best Practices for Publishing Data
Page 5: Strata NY: Best Practices for Publishing Data

HeavyData Consumers

Providers of

Data Delivery Technology

Page 6: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Computers Humans

Page 7: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Computers

• Structure

Humans

• Understand and use

Page 8: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Computers

• Structure

Humans

• Understand and use

Page 9: Strata NY: Best Practices for Publishing Data

1. Simple formats2. Indexes, unique IDs and meta-data3. FAQs and feedback channels

Publishing for Computers

Page 10: Strata NY: Best Practices for Publishing Data

"Don't anthropomorphize computers - they hate it."

- Unknown

Simple Formats

Page 11: Strata NY: Best Practices for Publishing Data

Simple Formats

Page 12: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Simple Formats:Tim Berners-Lee’s Five Stars

Page 13: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Simple Formats:You lost me at “Semantics”

Page 16: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Indexes, unique ids and meta-data

Page 17: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Indexes, unique ids and meta-data

Page 18: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Indexes, unique IDs and meta-data

• Must: Unique ID, Title, Last updated• Should: Meta-data

• Why?• No need for scraping

• Less load on your end• Ensures full coverage• Ensures content removal and updates

Page 19: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Indexes, unique IDs and meta-data

• Hard to emphasize enough!

• Unique IDs for everything: Datsets, columns, entities, ...

• Why?• Continuity: A small change for a man = giant leap for a

computer

Page 20: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Indexes, unique IDs and meta-data

• Any relevant contextual information• URL(s), descriptions, methodology, next updated, authors,

keywords, units, license information, ...

Page 21: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

FAQs and feedback channels

#1 reason for not publishing data:

“There are errors in the data and I don'twant others to discover them”

Page 22: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

FAQs and feedback channels

#1 reason for not publishing data:

“There are errors in the data and I dowant others to discover them”

Page 24: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

FAQs and feedback channels

Page 25: Strata NY: Best Practices for Publishing Data

1. Simple formats2. Indexes, unique IDs and meta-data3. FAQs and feedback channels

Publishing for Computers

Page 26: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Computers

• Structure

Humans

• Understand and use

Page 27: Strata NY: Best Practices for Publishing Data

1. Search / Discovery2. Visualization3. Download

Publishing for Humans

Page 28: Strata NY: Best Practices for Publishing Data

Search / Discovery

• Requirements differ from web/text search• A lot less textual content to base on

• Synonyms, dictionaries, autocomplete• But (hopefully) good meta-data = facets and filtering

• Give people ways to browse• Categories vs. tags vs. search• Serendipity: Random, related, interesting...

Page 30: Strata NY: Best Practices for Publishing Data

Visualize

Page 32: Strata NY: Best Practices for Publishing Data

109 columnsx

340 lines=

37.060 cells

Page 34: Strata NY: Best Practices for Publishing Data
Page 36: Strata NY: Best Practices for Publishing Data

Visualize

• What you should offer depends on the data

• Statistical data• Focus on the most common charts and get them right• Do NOT invent new visualizations or chart types

• Use standards compatible technologies• No Flash!• Charting and visualization libraries

Page 39: Strata NY: Best Practices for Publishing Data

Download

• Make it easy to use your data outside your tools• Play nicely with those providing functionality beyond what

you can offer: Tableau, R, SAS, MathLab, Mathematica, SPSS, ...

• Provide downloads in the formats most commonly used by your users:• Raw data: Excel, CSV, feeds (R, Excel live feeds, APIs)• Charts and visualizations: Bitmap, vector, PPT, embeds?

Page 40: Strata NY: Best Practices for Publishing Data

| BEST PRACTICES for PUBLISHING DATA | Hjalmar Gislason, [email protected] | October 2012

Computers

• Structure• Simple formats• Indexes, unique IDs and

meta-data• FAQs and feedback

channels

Humans

• Understand and use• Search / Discovery• Visualization• Download

Page 41: Strata NY: Best Practices for Publishing Data

F I N D A N D U N D E R S TA N D D ATA

Twitter: @datamarket · Facebook: DataMarket · E-mail: [email protected]

Hjalmar Gislason, founder & CEO