PATENTLY INNOVATIVE Finding where innovation lives! David E Drummond Insight Data Engineer
Aug 05, 2015
INTRO
Which state has more innovation? Use number of patents to see which regions are “patently innovative”.
DATA PIPELINE Ingestion Batch Processing Real-Time Queries
XML, TSVData Cleansing
JSON Hive SerDe
HappyBase
DATA PIPELINE Ingestion Batch Processing Real-Time Queries
XML, TSVData Cleansing
JSON Hive SerDe
HappyBase
DATA PIPELINE Ingestion Batch Processing Real-Time Queries
XML, TSVData Cleansing
JSON Hive SerDe
HappyBase
DATA PIPELINE Ingestion Batch Processing Real-Time Queries
XML, TSVData Cleansing
JSON Hive SerDe
HappyBase
HBASE SCHEMA
State 2005 2006 2007 … 2011 2012 2013CA 8530 7411 7120 … 7849 7799 9185TX 2167 1961 1806 … 2050 2121 2500
State 200501 200502 … 201408 201409CA 512 538 … 1380 1194
Denormalized schema for faster queries
TX 102 217 … 350 263
Yearly
Monthly
DAVID DRUMMONDEarned Ph.D in Physics from UC Riverside,
simulating fault tolerant parallel Quantum Computing systems. !
Love to travel and learn about everything!