Natural Language Search ...using Neo4j
Jan 27, 2015
Natural Language Search...using Neo4j
We’ll be covering...What is natural language search?
What do brains and graphs have in common?
How do you model time as a graph?
How do you model time-based events on a graph?
How do you anticipate natural language queries and map those to results?
How do you transform answers into questions?
What is Natural Language Search?
Natural language search is like querying a database using your own natural language.
In a way, it is kind of like programming a person with words (Teaching, Evangelism, Sales Pitches, Planning, etc.)
ProgressWhat is natural language search?
What do brains and graphs have in common?
How do you model time as a graph?
How do you model time-based events on a graph?
How do you anticipate natural language queries and map those to results?
How do you transform answers into questions?
What do brains and graphs have in
common?Networks condense a lot of information into small points.
These small points help us understand or interpret a lot of information by exploring the world from many different small points.
Graphs, like brains, help us explore a lot of information from relative points.
But what is a network?A network is a representation or model
of the interconnectedness of information.
A graph is the de facto mathematical component that defines the level of interconnectivity in a network.
A graph database merges these two concepts into a persistent storage medium.
Networks (Information) + Graph (Mathematics) = Neo4j
Graph of people meeting people
Anne met Pam
Pam met Sally
Sally met Anne
John met Sally
Path Finding = Searching
The key component when using a graph database is traversals.
Traversals model the pathways in a network by enumerating over all possibilities.
Possibilities that meet a criteria are returned by a query.
(Neo4j’s Cypher Query Language)
ProgressWhat is natural language search?
What do brains and graphs have in common?
How do you model time as a graph?
How do you model time-based events on a graph?
How do you anticipate natural language queries and map those to results?
How do you transform answers into questions?
Time based traversals
Time is a hierarchical method of categorizing the linearity of global events.
Hours, minutes, seconds...
“Neo4j Meetup is at 6:00 PM on October 29th”
Time Scale Event Meta Model
Modeling events over time is easy in Neo4j
Let’s go over the GraphGist for the Time Scale Event Meta Model
http://gist.neo4j.org/?github-kbastani%2Fgists%2F%2Fmeta%2FTimeScaleEventMetaModel.adoc
ac
Progress
What is natural language search?
What do brains and graphs have in common?
How do you model time as a graph?
How do you model time-based events on a graph?
How do you anticipate natural language queries and map those to results?
How do you transform answers into questions?
ProgressWhat is natural language search?
What do brains and graphs have in common?
How do you model time as a graph?
How do you model time-based events on a graph?
How do you anticipate natural language queries and map those to results?
How do you transform answers into questions?
Neo4j allows you to store information as a series of paths, and that is really valuable for giving a user options when it comes to search.
It starts with something I call
“Search Cache”
Search CacheA search cache is a repository of all relevant paths condensed into a hierarchical data store.
A hierarchical data store is like folder paths that model a storage collection into a linear path. (Dimensionality Reduction)
An address is a hierarchy, revealing a path.
ex. http://www.neo4j.com/download
ex. > root\neo4j-community\bin\neo4j.sh
Natural language path:
> w\h\a\t\ \i\s\ \t\h\e\ \m\a\t\r\i\x\?
Type Ahead / Autocomplete
For search it comes down to enumerating over all possibilities and then mapping those paths to an action.
http://kbastani.github.io/predictive-autocomplete
Never do real time processing for natural language search (It is a hard problem -- which means it will take time*)
Distributed Caching Frameworks
Take a distributed approach to building out your search cache.
Use Neo4j to model your network and then enumerate over all possibilities as a query and add each possibility to a search cache.
Distribute the load to a network of compute instances like MapReduce.
In C# at http://kbastani.github.io/predictive-autocomplete
How do I build a search cache?
The best way to do this is using blob storage.
I use Windows Azure, but you can use any data storage as long as it maps to a JSON file via HTTP GET request.
ex. HTTP GET
../natural/language/search/is/cool
.. Working on open source project using C#
ProgressWhat is natural language search?
What do brains and graphs have in common?
How do you model time as a graph?
How do you model time-based events on a graph?
How do you anticipate natural language queries and map those to results?
How do you transform answers into questions?
How to transform answers into questions?
You have a bunch of answers already in natural language.
Each language has a specific template that allows you to transform an answer into a question.
“X is Y” -> “What is X?”
Is X a Person? Then “Who is X?”
Add “What is X?” to the search cache.
Example: http://www.arktera.com/
Questions?
MATCH questions-[:without]->answers
RETURN *
0 Results Found
Neo4j Events
http://www.graphconnect.com
New York: November 5-6
London: November 18-19
http://www.graphconnect.com/videos
Watch the videos! Very valuable insights from our community
Neo4j Trainings
Interested in Neo4j training?
Talk to me after!
Thanks!
Follow me on Twitter!
@kennybastani
Connect with me on LinkedIn
/in/kennybastani