API’s, Freebase, and the Collaborative Semantic Web
Diana Tamabayeva & Dan Delany
970‐309‐8598
Object‐Oriented Representation
Structured data in object ‐ useful to
computers
“JohnSmithisafive‐foot‐tallmalewhoweighsone‐hundredeightypounds.He’s
forty‐twoyearsoldandhelovesdogs.”
String of characters ‐ useful to people
sortByHeight(users);
isADude(“JohnSmith”);
Object‐Oriented Representation
classDog<Mammal@home,@age,@humandefgoHome
go(@home);end
classHuman<Mammal@home,@age,@pets@spouse,@friendsdefdivorce
@spouse=nil;end
classPlace@latlong,@elevation,@address,@namedefisBelowSeaLevel
@elevation<0;end
Home
HomeHuman
Pet
SpouseFriends
O‐O instance variables and methods represent structured semantic relationships between pieces of information.
Structured Semantic Relationships Among Objects
We visualize this with a semantic graph.
LAMP
The (hypertext) Web Today
WebServers
WWW
UserComputer
ServersServers
The (semantic) Web Tomorrow?
UserComputer
Servers
DataAggregator/Visualizer
StructuredData(OWL/RDF)DocumentData(HTML/CSS)Hyperlinks(Oldwwwlinks)
A Cloud that Talks To Itself.
vs.
Why? Searchability.
With semantic graphs, you can perform semantic searches by traversing the graph:
“Where does the woman who lives at 2408 Walnut work?”
Why? So You Can Mash it Up.
mapmash.googlepages.com/gaza.html
Mashups create meaning from data.housingmaps.com
twittermap.com
Example: Image Analysis
Why? Context‐Aware AI
Today’s AI is limited by the domain‐specificity of its input data
Context‐Unaware: Blob Tracking
Blob2Blob1
Useful in it’s domain (in this case, touch detection), but ultimately has no ‘intelligence.’
Example: Image Analysis
Why? Context‐Aware AI
Today’s AI is limited by the domain‐specificity of its input data
Context‐Aware: Mine the semantic graph for heuristics and clues
DanRachel Taifur
TreesSky
You’re in the mountains near Aspen, CO. It is the fall. Your friends Taifur and Rachel are atthe same location. Etc.
Why? Context‐Aware AI
I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web ‐ the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day‐to‐day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines.
The ‘intelligent agents’ people have touted for ages will finally materialize.
Tim Berners‐Lee, 1999
14
Challenges ‐ Motivation/Critical Mass
Value comes from ubiquity, and ubiquity comes from value.How do we encourage adoption of technologythat does not yet provide value?
More importantly, how will it provide value to content producers? How is giving away your data’s meaning (your secret sauce) instead of presenting it alongside ads valuable?
Should semantic feeds be monetized?“Information As A Service”
Challenges ‐ Privacy/Security
• ReducedanonymityontheWeb• Increasedinvasionofprivacy• SoluSon:accessprivilegesmust
becontrolledbyinfrastructure‐levelsecurity
The State of the Semantic Web
Top‐Down: Content Structuring•SemanScsearchenginesgenerateandleverageaninternalsemanScdatabase•DatasSllreturnedasHTML,noAPInothelpingcreatethesemanScweb!
Top‐Down: Dapper
WWW
RSS
CSV
OWL
RDF
tosemanScweb
•Dapper‐UIforseXnguprulestoscrapesemi‐structureddata(CSV,RSS,XML)fromanysetofHTMLdocuments!
Top‐Down: Yahoo! Pipes
RSS OWL
RDF
tosemanScweb
•Yahoo!Pipes‐UIfortransforming,reformaXng,andcombiningdatafeedsintomoreusefuldatafeeds. JSON
JSON
CSV
RSS
Bottom‐Up: Publishing Standards
RDF and OWL standards•W3C‐sponsoredstandardsfordefiningsemanScrelaSonshipsandresources•Powerful,butcomplexandhardforhumanstoread/create•Nomo@va@onfordeveloperstocreate•NoW3C‐sponsoreduniversalontologies
1999
Dan’sCar CarVehicle Color
isA hasA
HasMul*ple(4)
RDF Data Node OWL Ontology
make
year
color
<rdf:Description rdf:about="http://.../DansCar"> <car:color>Red</car:color> <car:make>Honda</make> <car:year>1999</make>
owl:Class rdf:ID="Car" rdfs:subClassOf rdf:resource="#Vehicle" rdfs:subClassOf [a owl:Restriction; owl:cardinality "4"^^xsd:nonNegativeInteger; owl:onProperty <#Wheel> ]
API’s: I dream of a RESTful tomorrow
Thousands of content creators are already sharing structured data with API’s!
Bottom‐Up: Knowledge Bases
•CollaboraSvea_empttobuildfactualknowledgebase•“Object‐OrientedWikipedia”
DomainExperts
Bottom‐Up: Knowledge Bases
•Contentcontributedbyexpertsmanually,orbydatasetownersautomaScally.
DomainExperts
Seman@cAnalysis
Freebase: The Everything Graph
Metaweb Freebase Statistics
•FreebaseLaunch,March2007•LLCFounded,July2005
•25,379userstoday.•Growingby600‐800/month
•5.3milliontopicstoday•Growingby~15,000/month•Pulledfrompublicdata•Bycomparison,Wikipediahas2.64millionEnglisharScles
The Freebase ApproachCrea@veCommonsALribu@onLicense
CasualCollaborators
NodeEditorGUIApp Developers
Data Modelers
Expert Users &Dataset Owners
OpenAPI(MQL)
WWW
WWW
Exis@ngDatasets
ContentPublishers
RDFAPI
Freebase Data PolicyCrea@veCommonsALribu@onLicense
OpenAPI(MQL)RD
FAPI
•RequiresA_ribuSonofSource
[{ "album" : { "artist" : [], "name" : null, "release_date" : null }, "limit" : 25, "name" : null, "name~=" : "Love*", "type" : "/music/track" }]
MQLQuery:
Returns:[{ "album" : { "artist" : ["Massive Attack"], "name" : "Blue Lines", "release_date" : "1991-04-08" }, "name" : "One Love", "type" : "/music/track"},{"album" : { "artist" : [“Squirrel Nut Zippers"], "name" : "The Inevitable", "release_date" : "1995-03-17" }, "name" : "Anything But Love", "type" : "/music/track" }, { "album" : { "artist" : [ "PJ Harvey" ], "name" : "Dry", "release_date" : "1992-02-11" }, "name" : "Oh My Lover", "type" : "/music/track" },
DataFreedom!
The Freebase ApproachCrea@veCommonsALribu@onLicense
CasualCollaborators
NodeEditorGUIApp Developers
Data Modelers
Expert Users &Dataset Owners
OpenAPI(MQL)
WWW
WWW
Exis@ngDatasets
ContentPublishers
RDFAPI
Freebase Community
CasualCollaborators
App Developers
Data Modelers
Expert Users &Dataset Owners
AppDevelopersList
DataModelersList
IRCChat(opentoall)
DiscussionThreadsonIndividualTopics
•NoCentralForum•NoBacklogofMailingList•NoFriends•NoPrivateMessages
“Asaprogrammer,IfeelthatI'mmosteffecSvewhenI'mcontribuSng
largedatasets…faciliSesbuiltintoFreebasethatletusersuploadlistsoftopicsarelimitedtospecificsituaSons.”‐ Shawn Simister
Freebase Community Tools
CasualCollaborators
App Developers
Data Modelers
Expert Users &Dataset Owners
“Acre,theFreebaseapplicaSondevelopmentplaqorm,letsanyonemashupFreebasedatausingJavascriptandhaveithostedforfree.”‐ Shawn Simister, developer
Employees
“Wehavebeenworkinghardrecentlytoprovide
bulkimporttoolsforFreebase.Whilesuchtoolsexistinternally,thereconciliaSonprocesshasthusfarbeentoocomplicatedforpublicrelease.”‐ Brian Culbertson, Metaweb Engineer
“[DataModelingishardbecausenewschemasareaslowprocess,andtheycanbreakusers’code.Let’shavea
“SloppyFreebase”thatallowsuserstoenterunstructureddataunSlnewschemasaredefined.]”‐Jack Alves, former Metaweb Director of Engineering
“Sloppy” Data Modeling
“[DataModelingishardbecausenewschemasareaslowprocess,andtheycanbreakusers’code.Let’shavea
“SloppyFreebase”thatallowsuserstoenterunstructureddataunSlnewschemasaredefined.]”‐Jack Alves, former Metaweb Director of Engineering
Currently, Freebase users cannot submit data if there is not already a data structure + ontology built for that data type.
“Sloppy” data creation allows users to create their own data types, which will later be cleaned and standardized.
CasualCollaborators
Bob Jonesmajor: English
Eric Bradleystudying:
CS
John Greenfocus:
Sociology
User‐Generated, Semi‐structured “sloppy” data
Automated DataCleaner‐Upper
Bob Jonesarea of study:
English
Eric Bradleyarea of study:
CS
John Greenarea of study:
Sociology
Clean, structured data
OWL
RDF
tosemanScweb
Conclusion
Wearehere. • Progress• RDF/OWL• Freebase‐5.3mil.topics!• Dapper,Yahoo!Pipes,other
dataabstractors• API’soutthewazoo
• Issues• AdopSon• Privacy/Security• IntellectualProperty• We’regoodatmaking
content,butwesuckatminingitanddescribingit.