Basis Technology – Human Language Technology Conference 2012 1 Clouds, Search or HLT The 'forecast'? Benson Margulies Executive Vice President and Chief Technology Officer
Jun 21, 2015
Basis Technology – Human Language Technology Conference 2012 1
Clouds, Search or HLT The 'forecast'?
Benson Margulies Executive Vice President and Chief Technology Officer
Basis Technology – Human Language Technology Conference 2012 2
Clouds, Search or HLT The 'forecast'?
Basis Technology – Human Language Technology Conference 2012 3
Meteorology - or - Why Clouds
• Lie on the grass and look up at the clouds • Everyone sees something different
• Computerized Clouds are no different • Applica;ons Always Available • Data Always Available • Tools for Processing Big Data
Basis Technology – Human Language Technology Conference 2012 4
Big Data and Clouds =~ Hadoop
• It's not just a maFer of size • Hadoop ...
o Takes in structured data sets o Op;mizes stateless, batch processes o Moves computa3on to data
• All of which is great if that's what you have • The world is more complicated than that
Basis Technology – Human Language Technology Conference 2012 5
What it Doesn't Do So Easily
• On-‐the-‐fly (non-‐batch) processing • Stateful, non-‐local, processing • For example, consider a search engine
o All about online: a document arrives, users want to find it.
o All about global state: relevancy involves global data across the whole index.
Basis Technology – Human Language Technology Conference 2012 6
More on Search-in-a-Cloud
• Good News: 'conven;onal' technologies scale to very large indices. o Solr o SolrCloud o Elas;c Search o ...
• How? Shards. o 'hash' to split docs o queries go everywhere
Basis Technology – Human Language Technology Conference 2012 7
Search-in-a-Cloud less good news
• Alterna;ves are s;ll: o Limited o Research o or both
• Solandra o Scaling via Cassandra o 'just another sharded solu;on' o Just the thing if you like Cassandra
• or Accumulo o So far, very basic inverted index o beFer things coming
Basis Technology – Human Language Technology Conference 2012 8
Other HLT tasks ...
• 'Extrac;on' is 'straighZorward' • Text comes in, en;;es or rela;onships come
out. • Results end up in graph DB or bigtable or ... • Scale via Hadoop or whatever • The Challenge of Mixing and Matching • But ... what if you want a feedback loop?
Basis Technology – Human Language Technology Conference 2012 9
Interoperation
• Lot's of focus on applica;ons o e.g. Ozone Widgets
• Not so much on backend processes • What good is 'data everywhere' if:
o you can't deploy processing to exploit it? o you can't fit together pieces of the puzzle?
• A stovepipe in a cloud is s;ll........ • A stovepipe
Basis Technology – Human Language Technology Conference 2012 10
Harder Unstructured Problems
• Imagine you wanted to cluster ... • New items show up • Need to find 'best' exis;ng cluster
o It could be 'anywhere'
• Need to update to reflect each new item • (If you're wondering what we're clustering ...)
Basis Technology – Human Language Technology Conference 2012 11
Rosette Concrete Examples
• Straight Search o RoseFe Solr Plugins work all the same o SolrCloud hashes/shards o RoseFe runs on the target node
• Extrac;on and similar processes o Same story, using Update Request Processor
Basis Technology – Human Language Technology Conference 2012 12
Rosette and Hadoop
• Stateless APIs lead to simple implementa;on • Non-‐code resources lead to some issues • Stateful processes (e.g. RNI) ... back to Solr