Jan 21, 2015
Open Government Data & MongoDB
Luigi [email protected]
Question? @LuigiMontanez
Question? @LuigiMontanez
Open Data + Open Source = Open Government
Question? @LuigiMontanez
MongoDB enablesopen data
Question? @LuigiMontanez
Opening Up Data
✴ Gather data from disparate sources✴ Data dumps (SQL, Fixed-width columns)✴ Web scraping✴ Text/PDF parsing
✴ Serving RESTful JSON APIs
Question? @LuigiMontanez
JSON
✴ Tree structure, not tabular✴ Still relational✴ JSON for data, XML for documents✴ Closely resembles native data structures✴ No manual parsing needed
Question? @LuigiMontanez
Three Projects
✴ Poligraft✴ Real Time Congress API✴ Open State Project
Question? @LuigiMontanez
Three Projects
✴ Poligraft✴ Real Time Congress API✴ Open State Project
Question? @LuigiMontanez
App designdrives
schema design
Text
{ "title": "President Obama's climate 'Plan B' in hot water - Darren Samuelsohn - POLITICO.com"}
Text
{ "title": "President Obama's climate 'Plan B' in hot water - Darren Samuelsohn - POLITICO.com",
"slug": "EOsc","source_url": "http://www.politico.com/news/stories/0810/40534.html","content": ".................",
}
Text
{ "title": "President Obama's climate 'Plan B' in hot water - Darren Samuelsohn - POLITICO.com",
"slug": "EOsc","source_url": "http://www.politico.com/news/stories/0810/40534.html","content": ".................","entities": [...]
}
Text
{ "title": "President Obama's climate 'Plan B' in hot water - Darren Samuelsohn - POLITICO.com",
"slug": "EOsc","source_url": "http://www.politico.com/news/stories/0810/40534.html","content": ".................","entities": [
{"name": "Barack Obama","type": "politician",},...
]}
Text
{ "title": "President Obama's climate 'Plan B' in hot water - Darren Samuelsohn - POLITICO.com",
"slug": "EOsc","source_url": "http://www.politico.com/news/stories/0810/40534.html","content": ".................","entities": [
{"name": "Barack Obama","type": "politician","breakdown": {"indiv": "33", "pac": "67"}"top_industries": ["Lawyers/Lobbyists","Finance/Insurance/Real Estate","Misc. Business"]},...
]}
Question? @LuigiMontanez
Natural Schemas
Question? @LuigiMontanez
Three Projects
✴ Poligraft✴ Real Time Congress API✴ Open State Project
Real-Time Congress API
Credit: vgm8383 on Flickr
Android App: “Congress”
Politiwidgets
Question? @LuigiMontanez
Requirements✴ Aggregate lots of data
Biographical, Bills, Votes, Earmarks, Video Clips, Floor Updates, Legislative Documents, Committee Schedules, Contributions, Interest Group Ratings
✴ Lightweight responses
{legislator: { in_office: true, title: "Rep", nickname: "", district: "9", bioguide_id: "L000551", govtrack_id: "400237", phone: "202-225-2661", website: "http://lee.house.gov/index.html", twitter_id: "", last_name: "Lee", name_suffix: "", last_updated: "2010/04/13 00:00:14 +0000", party: "D", chamber: "house", state: "CA", youtube_url: "http://www.youtube.com/RepLee", first_name: "Barbara", gender: "F", congress_office: "2444 Rayburn House Office Building", earmarks: { average_number: 20, total_amount: 10000000, average_amount: 22994535, total_number: 28, last_updated: "2010-03-18", fiscal_year: 2010, } ...}
// limit selection to a subset of fieldsdb.people.find( { 'first_name' : 'john' }, { 'last_name' : 1, 'address' : 1 } );
// use dot-notation to dig into an objectdb.people.find( { 'state': 'CA' }, { 'address.zip_code': 1 } );
{legislator: { last_name: "Lee", first_name: "Barbara", state: "CA", earmarks: { average_number: 20, total_amount: 10000000, average_amount: 22994535, total_number: 28, last_updated: "2010-03-18", fiscal_year: 2010, }}
?sections=last_name,first_name,state,earmarks
{legislator: { last_name: "Lee", first_name: "Barbara", state: "CA", earmarks: { total_amount: 10000000, total_number: 28 }}
?sections=last_name,first_name,state,earmarks.total_amount,earmarks.total_number
Question? @LuigiMontanez
Partial responses make payloads
smaller
Question? @LuigiMontanez
Three Projects
✴ Poligraft✴ Real Time Congress API✴ Open State Project
Question? @LuigiMontanez
50 States =50 Formats
Question? @LuigiMontanez
Schemalessness allows for granular
control
Question? @LuigiMontanez
Custom Fields✴ Traditional RDBMS
✴ Update the schema for new fields, run a migration, feel icky
✴ Create a custom_fields table✴ MongoDB
✴ Just store it
Question? @LuigiMontanez
Speaking JSONnatively
Source Scraped JSON PythonTransform PostgreSQL
Source Scraped JSON MongoDB
Question? @LuigiMontanez
Three Projects
✴ Poligraft✴ Real Time Congress API✴ Open State Project
Developer Happiness
Question? @LuigiMontanez
Thanks!sunlightlabs.com@LuigiMontanez