Top Banner
Data Formats and APIs Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 0 Mike Carey [email protected]
29

Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Jul 26, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Data Formats and APIs

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 0

Mike [email protected]

Page 2: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Announcements

• Keep watching the course wiki page (especially its attachments):• https://grape.ics.uci.edu/wiki/asterix/wiki/stats170ab-2018

• Ditto for the Piazza page (for Q&A):• http://piazza.com/uci/winter2018/stats170a/home

• Note: HW#3 is due tonight (11:45pm)• HW#4 should be available by then as well

• Today:• More PostgreSQL techniques and tips• Twitter APIs and Python’s Tweepy package• Beyond tables: XML and JSON

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 1

Page 3: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

XML

• Stands for eXtensible Markup Language• XML 1.0 – a recommendation from W3C, 1998• Roots: SGML (a complex document markup language)• After the roots: a format for sharing data as well

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 4: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Why XML is of Interest

• XML is just syntax for data• (Note: we have no syntax for relational data!)• XML is not relational: it’s semistructured

• XML’s data syntax is exciting because:• Can translate any data to XML• Can ship XML over the Web (HTTP)• Can input XML into any application• Thus: Data sharing and exchange on the Web!

(Note: JSON is another similar technology today.)

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 5: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

HTML (a descendant of SGML)

<h1> Bibliography </h1><p> <i> Foundations of Databases </i>

Abiteboul, Hull, Vianu<br> Addison Wesley, 1995

<p> <i> Data on the Web </i>Abiteoul, Buneman, Suciu<br> Morgan Kaufmann, 1999

HTML describes the presentationMichael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 6: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

XML<bibliography>

<book> <title> Foundations of Databases </title><author> Abiteboul </author><author> Hull </author><author> Vianu </author><publisher> Addison Wesley </publisher><year> 1995 </year>

</book>. . . .

</bibliography>

XML describes the contentMichael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 7: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

XML Terminology: Elements & Tags

• Tags: book, title, author, …• Start tag: <book>, end tag: </book>• Elements: <book>…</book>,<author>…</author>• Elements can be nested• Empty element: <red></red> (abbreviated <red/>)• XML document: single root element

Well formed XML document: matching/nested tags

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 8: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

More XML: Attributes

<book price = “55” currency = “USD”><title> Foundations of Databases </title><author> Abiteboul </author>…

<year> 1995 </year></book>

Attributes are alternative ways to represent data

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 9: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

More XML: Attributes Revisited

<book><title> Foundations of Databases </title><author> Abiteboul </author>…

<year> 1995 </year><price currency = “USD”> 55 </price>

</book>

Attributes are best used to represent “metadata”!Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 10: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

XML Semantics: Tree of Data

<data><person id=“o555” >

<name> Mary </name><address>

<street> Maple </street> <no> 345 </no> <city> Seattle </city>

</address></person><person>

<name> John </name><address> Thailand </address><phone> 23456 </phone>

</person></data>

data

Mary

personperson

name addressname address

street no city

Maple 345 Seattle

JohnThailand

phone

23456

id

o555

Elementnode

Textnode

Attributenode

Also: Order matters! (Or at least it can…)Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 11: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

XML Data

• XML is self-describing• Schema information is part of the data

• Consider a relational schema: person(name, phone)• In XML <person>, <name>, <phone> are part of the data

(and are repeated for each person)• Consequence: XML is much more flexible

• Can have variations from instance to instance• Supports “schema later” (or “schema never”) methodology

• XML = semistructured data

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 12: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Ex: Relational Data as XML

<person><row> <name>John</name>

<phone> 3634</phone></row><row> <name>Sue</name>

<phone> 6343</phone><row> <name>Dick</name>

<phone> 6363</phone></row></person>

n a m e p h o n e

J o h n 3 6 3 4

S u e 6 3 4 3

D i c k 6 3 6 3

row row row

name name namephone phone phone“John” 3634 “Sue” “Dick”6343 6363

person relation: XML: person

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 13: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

XML is Semi-structured Data

• Missing elements and/or attributes:

• Could represent in atable with nulls:

<person> <name>John</name><phone>1234</phone>

</person><person> <name>Joe</name></person> ß No phone!

name phone

John 1234

Joe -Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 14: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

XML is Semi-structured Data

• Repeated attributes

• Impossible in tables (w/o normalization – due to 1NF)

<person> <name> Mary</name><phone>2345</phone><phone>3456</phone>

</person>

ß Two phones!

name phone

Mary 2345 3456 ???

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 15: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

XML is Semi-structured Data

• Attributes with different types in different objects

• Nested collections (not 1NF)• Heterogeneous collections:

• <db> containing both <book>’s and <publisher>’s

<person> <name> <first> John </first><last> Smith </last>

</name><phone>1234</phone>

</person>

ß Structured name!

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 16: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

XML: So What Is It Again…?

¡ A standard, flexible, self-describing syntax used to represent and exchange data of all shapes and sizes§ Regular, structured data (think records)▪ E.g., a purchase order (customer info and line items)▪ Record-like, typed, nested data values

§ Irregular, unstructured data (think documents)▪ E.g., a book (title, author, chapters, and text)▪ Text-like, untyped, variant, marked-up data values

¡ Uses include document storage, data exchange, Web service calls, B2B messaging, information integration, even configuration metadata…

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 17: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

XML: One Final Example<?xml version="1.0" encoding="ISO-8859-1" ?><catalog><book isbn="ISBN 1565114302"><title>No Such Thing as a Bad Day</title><author>Hamilton Jordan</author><publisher>Longstreet Press, Inc.</publisher><price currency="USD">17.60</price><review><reviewer>Publisher</reviewer>: This book is the moving account

of one man's successful battles against three cancers ...<title>No Such Thing as a Bad Day</title> is warmly recommended. </review></book>

<!-- more books and specifications -->

</catalog>

(Mixed content)

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 18: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

JSON

• JavaScript Object Notation• Born from JavaScript, now language-independent

• Minimal• Much (much!) simpler than XML

• Textual• Machine- and human-readable format

• Subset of JavaScript• But similar to many languages’ types (including Python)

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 19: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Values

• Primitive values• Strings• Numbers• Booleans

• Structured values• Objects• Arrays

• A special “missing” value• null• (or a field can be altogether missing)

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 20: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Numbers

• Integer• Real• Scientific

• No octal or hex• No NaN or Infinity

• Use null instead

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 21: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Booleans

• true• false

null• A value that isn't anything

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 22: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Object

• Objects are unordered containers of key/value pairs• Objects are wrapped in { }• , separates key/value pairs• : separates keys and values• Keys are strings• Values are any JSON values

• Similar to struct, record, hashtable, object, dict, …

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 23: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Object Example{

"name": "Jack B. Nimble", "at large": true, "grade": "A", "format": {

"type": "rect", "width": 1920, "height": 1080, "interlace": false, "framerate": 24

}}

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 24: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Array

• Arrays are ordered sequences of values• Arrays are wrapped in []• , separates values • JSON does not talk about indexing

• JSON is just a data format (not a language)• An implementation can start array indexing at 0 or 1

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 25: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Array Examples

["Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"]

[[0, -1, 0],[1, 0, 0],[0, 0, 1]

]

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 26: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Arrays vs. Objects

• Use objects when the key names are arbitrary strings – i.e., for record-like data• Similar to a dict in Python (slightly more restrictive)

• Use arrays when the key names are sequential integers – i.e., for indexed sequences• Similar to a tuple or an array in Python

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Page 27: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

JSON vs. Relational (and CSV)

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

Relational (and CSV) JSONStructure Flat (Tables) Nested (Complex Objects)

Schema Per collection (and static) Per object

Query Support SQL standard Varies (no standard)

Ordering None (sets/bags) Includes arrays

Native System Support

DB2, Oracle, SQL Server, SQLite, PostgreSQL, MySQL, ….

MongoDB, Couchbase Server, AsterixDB, …

Page 28: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

JSON vs. XML

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018

XML JSONVerbosity Higher LowerComplexity Higher LowerUse of Validation Common (DTD, xsd) Rare (JSON schema)PL Friendliness Low (impedance mismatch) HighQuery Support XSLT, XPath, XQuery JAQL, AQL, JSONiq, SQL++

Page 29: Data Formats and APIs - Application Server...JSON vs. XML Michael Carey/PadhraicSmyth, UC Irvine: Stats 170A/B, Winter 2018 XML JSON Verbosity Higher Lower Complexity Higher Lower

Questions?

• Next time we’ll talk about data management technologies (databases and query languages) for “modern data”

Michael Carey/Padhraic Smyth, UC Irvine: Stats 170A/B, Winter 2018 28