Top Banner
Data Dr. Michele C. Weigle CS 725/825 Information Visualization Spring 2017 http://www.cs.odu.edu/~mweigle/CS725FS17/ Note ! We will not cover these slides in class, but they are required reading for the week. ! There are a few supplemental images for the textbook reading and a few slides on data formats, data cleaning, and data sources CS 725/825 F Spring 2016 F Weigle 2
13

Dr.&Michele&C.&Weigle mweigle ...mweigle/courses/cs725-s17/02-Data.pdf · We will not cover these slides in class, but they are required reading for the week.!There are a few supplemental

Jul 01, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dr.&Michele&C.&Weigle mweigle ...mweigle/courses/cs725-s17/02-Data.pdf · We will not cover these slides in class, but they are required reading for the week.!There are a few supplemental

Data

Dr.&Michele&C.&Weigle

CS&725/825Information&VisualizationSpring&2017

http://www.cs.odu.edu/~mweigle/CS725FS17/

Note! We will not cover these slides in class, but they

are required reading for the week.

! There are a few supplemental images for the textbook reading and a few slides on data formats, data cleaning, and data sources

CS&725/825&F Spring&2016&F Weigle2

Page 2: Dr.&Michele&C.&Weigle mweigle ...mweigle/courses/cs725-s17/02-Data.pdf · We will not cover these slides in class, but they are required reading for the week.!There are a few supplemental

Data

CS&725/825&F Spring&2016&F Weigle3

Ward,&Grinstein,&Keim,&Fig&1.37

Tables

CS&725/825&F Spring&2016&F Weigle4

n items

m attributes

Ward,&Grinstein,&Keim,&Fig&1.37

value

Page 3: Dr.&Michele&C.&Weigle mweigle ...mweigle/courses/cs725-s17/02-Data.pdf · We will not cover these slides in class, but they are required reading for the week.!There are a few supplemental

Networks&and&Trees

CS&725/825&F Spring&2016&F Weigle5

http://en.wikipedia.org/wiki/File:Wikipedia_multilingual_network_graph_July_2013.svg http://commons.wikimedia.org/wiki/File:Binary_tree.svg

Field&Dataset

CS&725/825&F Spring&2016&F Weigle6

http://en.wikipedia.org/wiki/File:PETFimage.jpg

Page 4: Dr.&Michele&C.&Weigle mweigle ...mweigle/courses/cs725-s17/02-Data.pdf · We will not cover these slides in class, but they are required reading for the week.!There are a few supplemental

Geometry

CS&725/825&F Spring&2016&F Weigle7

http://en.wikipedia.org/wiki/File:CntrFmapF1.jpg

Multidimensional&Table

CS&725/825&F Spring&2016&F Weigle8

http://en.wikipedia.org/wiki/Table_(information)

Page 5: Dr.&Michele&C.&Weigle mweigle ...mweigle/courses/cs725-s17/02-Data.pdf · We will not cover these slides in class, but they are required reading for the week.!There are a few supplemental

Temporal&Semantics

CS&725/825&F Spring&2016&F Weigle9

http://commons.wikimedia.org/wiki/File:Evidence_of_global_warming_F_time_series_of_seasonal_(red_dots)_and_annual_average_(black_line)_of_global_upper_ocean_heat_content_for_the_0F700m_layer_between_1955F2008.gif

Data&Formats! Delimited Text

! tabbed delimited! comma delimited (CSV)

! Extensible Markup Language (XML)! looks a bit like HTML! user-defined tags to identify data

! JavaScript Object Notation (JSON)! collection of name/value pairs! smaller than XML! easier to parse

CS&725/825&F Spring&2016&F Weigle10

Page 6: Dr.&Michele&C.&Weigle mweigle ...mweigle/courses/cs725-s17/02-Data.pdf · We will not cover these slides in class, but they are required reading for the week.!There are a few supplemental

Example&of&Data&Formats

CS&725/825&F Spring&2016&F Weigle11

20090101,2620090102,3420090103,27

<weather_data><observation>

<date>20090101</date><max_temp>26</max_temp>

</observation><observation>

<date>20090102</date><max_temp>34</max_temp>

</observation><observation>

<date>20090103</date><max_temp>27</max_temp>

</observation></weather_data>

{"observations":&[{"date":"20090101",&"max_temp":26},{"date":"20090102",&"max_temp":34},{"date":"20090103",&"max_temp":27}]}

JSON

CSV XML

Yau,&Visualize+This,&Ch&2

How&to&convert&between&data&formats?! Write a program to convert from one format to another

! awk (my favorite, but I'm old school), Python, Perl, PHP, ...

! Other tools! search Google for "csv to json", "csv to xml", "xml to json"

! Mr. Data Converter! http://shancarter.github.io/mr-data-converter/! developed by a graphics editor at The New York Times! input: CSV or tab-delimited data! output: HTML table, JSON, MySQL, Python, PHP, Ruby,

XML, ...

CS&725/825&F Spring&2016&F Weigle12

Page 7: Dr.&Michele&C.&Weigle mweigle ...mweigle/courses/cs725-s17/02-Data.pdf · We will not cover these slides in class, but they are required reading for the week.!There are a few supplemental

Data&in&the&Real&World! Data can be missing, have typos, be

inconsistent, spread over multiple tables, etc.

! Two big issues:! format! accuracy

CS&725/825&F Spring&2016&F Weigle13

World&Disasters&F Inconsistent

CS&725/825&F Spring&2016&F Weigle14

http://www.infochimps.com/datasets/disastersFworldwideFfromF1900F2008

Page 8: Dr.&Michele&C.&Weigle mweigle ...mweigle/courses/cs725-s17/02-Data.pdf · We will not cover these slides in class, but they are required reading for the week.!There are a few supplemental

World&Disasters&F Missing

CS&725/825&F Spring&2016&F Weigle15

http://www.infochimps.com/datasets/disastersFworldwideFfromF1900F2008

What&to&do&with&dirty&data?

CS&725/825&F Spring&2016&F Weigle16

Page 9: Dr.&Michele&C.&Weigle mweigle ...mweigle/courses/cs725-s17/02-Data.pdf · We will not cover these slides in class, but they are required reading for the week.!There are a few supplemental

Data&Cleaning&ToolsQuick Tools

! Data Science Toolkit! http://www.datasciencetoolkit.org/! lots of quick conversion tools

! Mr. People! http://people.ericson.net/! formats lists of names

! Mr. Data Converter! http://shancarter.github.io/mr-

data-converter/

Full Apps

! Trifacta Wrangler! https://www.trifacta.com/products

/wrangler/! video: https://vimeo.com/175859872

! Open Refine (was Google Refine)! http://openrefine.org/! video:

http://www.youtube.com/watch?v=B70J_H_zAWM

! more info: http://www.propublica.org/nerds/item/using-google-refine-for-data-cleaning

CS&725/825&F Spring&2016&F Weigle17

What&about&accuracy?! Nathan Yau (Visualize This, Data Points) was

intern at The New York Times

! One day, his entire goal was to verify 3 numbers in a dataset

! Must have accurate data before can trust the visualization

CS&725/825&F Spring&2016&F Weigle18

Yau,&Visualize+This,&Ch&1

Page 10: Dr.&Michele&C.&Weigle mweigle ...mweigle/courses/cs725-s17/02-Data.pdf · We will not cover these slides in class, but they are required reading for the week.!There are a few supplemental

Recall&the&marriage&rate&chart

CS&725/825&F Spring&2016&F Weigle19

Graphing&the&raw&data

CS&725/825&F Spring&2016&F Weigle20

New&Hampshire?!?

Page 11: Dr.&Michele&C.&Weigle mweigle ...mweigle/courses/cs725-s17/02-Data.pdf · We will not cover these slides in class, but they are required reading for the week.!There are a few supplemental

Let's&look&at&the&Excel&file

CS&725/825&F Spring&2016&F Weigle21

http://www2.census.gov/library/publications/2011/compendia/statab/131ed/tables/12s0133.xls

Now,&let's&look&at&the&PDF

CS&725/825&F Spring&2016&F Weigle22

http://www2.census.gov/library/publications/2011/compendia/statab/131ed/tables/vitstat.pdf

Page 12: Dr.&Michele&C.&Weigle mweigle ...mweigle/courses/cs725-s17/02-Data.pdf · We will not cover these slides in class, but they are required reading for the week.!There are a few supplemental

But&wait,&there's&more&funny&stuff

CS&725/825&F Spring&2016&F Weigle23

PDF Excel

Bottom&Line! If you see something weird in your graph that

you can't explain, go back and double-check your data

! Even if you didn't make an error, maybe someone else did

CS&725/825&F Spring&2016&F Weigle24

Page 13: Dr.&Michele&C.&Weigle mweigle ...mweigle/courses/cs725-s17/02-Data.pdf · We will not cover these slides in class, but they are required reading for the week.!There are a few supplemental

Data&Sources! Some notable ones

! Data.gov! Google Public Data Explorer

! http://www.google.com/publicdata/directory ! Census Bureau

! Census data visualization gallery - http://www.census.gov/dataviz/! Federal Reserve

! FRASER - http://fraser.stlouisfed.org/! FRED - http://research.stlouisfed.org/fred2/

CS&725/825&F Spring&2016&F Weigle25

Even&more&on&the&Links&page