Page 1
Frameworks for Querying Databases Using Natural Language: A
Literature Review
Hafsa Shareef Dar
Dept. of Software Engineering, University of Gujrat, 50700 Punjab, Pakistan
Email: [email protected]
M. Ikramullah Lali, Moin Ul Din
Dept. of Computer Science, University of Gujrat, 50700 Punjab, Pakistan
Email: [email protected] ; [email protected]
Khalid Mahmood Malik
Computer Science and Engineering Department, Oakland University, 2200 N. Squirrel Rd,
Rochester, MI 48309, USA
Email: [email protected]
Syed Ahmad Chan Bukhari*
Division of Computer Science, Mathematics and Science, College of Professional Studies, St. John’s University,
New York, USA
Corresponding should be addressed to Syed Ahmed Chan Bukhari [email protected] *
Abstract: A Natural Language Interface (NLI) facilitates users to pose queries to retrieve information from
a database without using any artificial language such as the Structured Query Language (SQL). Several
applications in various domains including healthcare, customer support and search engines, require
elaborating structured data having information on text. Moreover, many issues have been explored
including configuration complexity, processing of intensive algorithms, and popularity of relational
databases, due to which translating natural language to database query has become a secondary area of
investigation. The emerging trend of querying systems and speech-enabled interfaces revived natural
language to database queries research area., The last survey published on this topic was six years ago in
2013. To best of our knowledge, there is no recent study found which discusses the current state of the art
translations frameworks for natural language for structured and non-structured query languages. In this
paper, we have reviewed 47 frameworks from 2008 to 2018. Out of 47, 35 were closely relevant to our
work. SQL based frameworks have been categorized as statistical, symbolic and connectionist approaches.
Whereas, NoSQL based frameworks have been categorized as semantic matching and pattern matching.
These frameworks are then reviewed based on their supporting language, scheme of their heuristic rule,
interoperability support, dataset scope and their overall performance score. The findings stated that 70% of
the work in natural language to database querying has been carried out for SQL, and NoSQL share 15%,
10% and 5% of languages like SPAROL, CYPHER and GREMLIN respectively. It has also been observed
that most of the frameworks support English language only.
Keywords NL2DB, Database, NLP, SQL, NoSQL, Cypher, SPARQL
1. Introduction
Natural language to database querying frameworks translate natural language questions to valid database
query languages. This translation helps to bridge the communication gap between non-technical users and
database systems, as users do not require to understand the database schemas and query language syntax
(Reis, 1997; Christian, 2010). Therefore, it is always desirable for the non-technical users to have a natural
language interface for database querying. The history of natural language interface to database querying
dates back to 1970s when the LUNAR and LADDER systems were developed for non-technical users to
Page 2
pose natural language questions about the moon rock samples and US naval ships respectively (Woods,
1972). The rapid evolution of computer hardware and software in the last five decades have influenced
databases in such a way that the database systems which were developed in 1970s are not even compatible
with the current definition of a database (Bercich 2003; Frank 2018). Since then, several natural languages
to database querying frameworks have been developed to fulfill the industry needs. By studying the
development timeline of such systems, we have identified interesting research trends in translating natural
language to database queries domain. The CHAT-80 was the leading natural language to database query
system which was developed in 1980 (Warren, and Pereira, 1982). Early developed system had poor
retrieval time, less support for the language portability, and had complex configuration processes. These
factors contributed towards less adaptation of such systems for the commercial purposes.
Translating a natural language question into various database query languages such as SQL, Simple
Protocol and RDF Query Language (SPARQL) is not a trivial task, as the current databases are diverse,
gigantic in size and follow sophisticated data storage mechanisms (Nadkarni, 2011). Storage engines often
store data in a variety of ways such as in structured format (tabular), No SQL or graph (text) or in hybrid
format. Therefore, underlying storage engines require different query languages to retrieve the stored data.
This heterogeneity of data storage mechanisms increases the complexity of natural language to database
query translation. With the advancement of machine learning techniques, various frameworks have been
developed and are able to efficiently translate natural language questions (from simple to complex
questions) into database specific queries (SQL, NoSQL) (Yossi Shani, 2016) (Elías Andrawos, 2013).
The last review paper about natural language to database framework was published in 2013 (Sripad and
n.d. 2013) which has classified the natural language querying framework for SQL only. Available review
paper on this topic (Androutsopoulos, Ritchie and Thanisch, 1995) have mainly covered natural language
to SQL database and highlighted the usage of developed systems so far. In this survey paper, we have
reviewed Natural language to database querying frameworks developed for both the structured (SQL) and
non-structured database query languages (NoSQL, GraphDB). Using Google Scholar, we have found thirty-
five relevant frameworks published from 2008 to 2018. This review excludes papers which describe
proposed approaches without corresponding evaluation i.e. precision and accuracy, on any benchmark. We
have sub-divided the developed frameworks into two main categories (SQL and NoSQL) and provided a
comprehensive review of each section (Figure 1). Moreover, for each category, a feature comparison among
the developed frameworks documenting their salient features and highlighting their shortcomings has also
been provided. The comparison has been conducted on different factors including language and approach
supported, performance evaluation and others.
Page 3
Figure. 1 Classification of natural language to database querying frameworks.
SQL and non-SQL categories can be further divided into rule based and syntax analysis, syntactic pattern,
machine learning and knowledge based/external resources. Furthermore, these sub-categories have been
reviewed for different approaches including semantic matching, pattern matching, supervised and
unsupervised learning and statistical approach. Statistical approaches use large text corpora and perform
analysis based on text characteristics without considering significant linguistic knowledge. Similarly,
symbolic approach is widely used as a learning measures to different machine learning techniques.
Connectionist approach proves to be an efficient model of learning tasks, therefore, the combination of
connectionist with statistical or symbolic approach is an important area in natural language processing
(Stefan, Ellen and Gabriele, 1996). Next section covers materials and methods used in conduction of this
study.
1.1 Material and Methods
The most crucial part of this study was availability of relevant material. The articles were searched using
authentic scientific databases including SPRINGER Link, IEEE, ACM Digital Library, Google Scholar,
Emerald, Science Direct and Elsevier. Furthermore, some other databases were also explored but due to
accessibility restrictions, they were not included. Search strategy was also designed based on different
keywords like ‘querying databases’, ‘natural language databases’, ‘frameworks for NLDB’, ‘natural
language interfaces’, ‘SQL-based frameworks’, and ‘NoSQL frameworks’. Figure 2 further explains the
selection procedure and keywords searching designed for this study.
NL2DB
NoSQL
Rule Based & Syntax Analysis
Semantic Matching
Pattern Matching
Syntactic Pattern
SQL
Machine Learning
Statistical Approach
Symbolic Approach
Connectionist Approach
Knowledge Based/External
Resources
Page 4
Figure 2 Selection procedure and keyword search
Figure 2 shows the selection procedure of articles and keywords searching applied in this study. For
selection of articles, scientific databases were selected at first. This step helped to design search strategy
for extracting more relevant material. In step two, initial screening was performed based on step one and
1247 articles were gathered. The articles were selected based on their titles and filtering was performed.
After first filter, 749 articles seemed to be relevant. In step three, abstracts of the selected articles were
studied, filtered and 425 articles were selected. After reviewing full text articles, 70 were selected because
these articles have discussed 35 frameworks that are relevant to our work.
2. Background
This section presents a comprehensive review of the frameworks, shown in figure 1, that are developed for
the natural language querying of structured (SQL) and unstructured (NoSQL) databases. A brief overview
of these frameworks along with comparison of their features have been presented in this section.
2.1 Translation of the Natural Language to the Structured Query Language (SQL) Frameworks
These frameworks have been categorized into three different approaches namely statistical, symbolic and
connectionist in machine learning and knowledge based/external resources.
2.1.1 Machine Learning
Several studies have proposed approaches based on supervised and unsupervised learning. In (Bunschus et
al. 2008) ontology generation approaches were discussed based on supervised learning whereas, Codo et
al. (2007) worked on training a classifier for top 50 ambiguities from a mayo clinic, performed on clinical
corpus. One of the drawbacks of using supervised learning methods is the requirement of huge training data
with manually done labeling which ultimately increases time, cost and labor (Poesio et al., 2008).
Unsupervised learning creates cluster to construct different hierarchies (Del et al., 2016). A study
(Missisikoff et al., 2002) presented that unsupervised approach combined linguistic and statistics methods
for performing ontology generation tasks for text but at the same time, it is followed by the drawback of
dependency on statistical data without knowing the significance of the context.
2.1.1.1 SQL Based Frameworks using Statistical Approach
The frameworks discussed here are SQL based using statistical approach. Different factors have been
considered while comparing these frameworks, including testing and performance measures of the data.
Wolfram Alpha, a famous search engine was developed by a team of researchers in Wolfram Research in
2009. It takes queries and requests submitted by users in the form of text fields and then performs
Sel
ecte
d S
cien
tifi
c D
B
Keywords Searching
Init
ial
Scr
eenin
g o
f A
rtic
les
N=
12
47
Applied Filter I
After Filter I N=749
Ab
stra
cts
Rev
iew
ed
Applied Filter II
After Filter II N=425
Full
Tex
t A
rtic
les
Fin
al
Fil
ter
After Excluded Articles N=70 and Frameworks=47
Relevant Frameworks F=35
Page 5
computations and visualizations from structured data of knowledge base coming from different books and
sites. At the end it displays results and interpretation of an input (Jonas, 2017).
An effort has been made by researchers to develop a framework which transforms English language input
to SQL for the sake of relevant information retrieval from relational databases (Rao et al. 2010). The
proposed framework provides the natural language to database queries translating infrastructure. However,
the translation scope was limited to a user defined data dictionary containing most of the words to be used
by the system. This framework allows users to extend by adding new translating grammar rules and data
dictionary. For example, it has employed linguistic understanding with parse tree, and further maps the
proximity of the patterns for the certain database concepts. The shortcoming of this system is that it does
not support dialogue-style querying.
Ganti et al. presented a framework “Keyword++” to improve the existing tools to translate a keyword query
to SQL statement (Ganti,2010). Proposed framework maps the query keyword to predicate and generate
differential query pair (DQP) against the keyword, then measures the correlation between DQP and the
predicates. Keywords to predicates mapping is further improved by aggregating the correlations which are
measured on multiple query pairs extracted from a query log. A materialized mapping has been performed
on the generated DQP from query log to translate query keywords to equivalent SQL statements. Proposed
system has been tested on an entity table comprised of 8,000 laptops. Overall 0.1 million web search queries
were extracted and trimmed to 500 queries as sample test set. Approximately, 2,000 keywords were
extracted where each keyword had 41 DQP and took 1.61 seconds to compute the mapping. Experiments
conducted on Keyword++ framework show that the effectiveness of the system is more the 80% compared
to existing approaches.
Data sharing among various organizations could help to facilitate the evidence-based treatment by
incorporating evidences from heterogeneous hospitals datasets Healthcare researchers and clinicians
require tools to extract relevant information from clinical information system’s data (Malik, 2018). These
distributed databases contain different data design models e.g., Entity Relationship and Entity Attribute
Value. Safari et al. proposed an algorithm to translate Restricted Natural Language Query (RNLQ) to SQL
(Safari,2014). Generic algorithms have been used for mapping and translation. In the first step query terms
are mapped to RNLQ via CliniDAL (Clinical Data Analytic Language) interface. Next temporal expression
of the query is interpreted via a 2-layer rule based technique. Translation from RNLQ to SQL is performed
via Top-K algorithm on the base of similarity that is further utilized by CliniDAL for the mapping process.
The implemented prototype was tested on four categories of queries and it achieved 84% of accuracy.
Li et al. presented an approach which deals with complex input queries of multiple domains to translate
them into SQL queries in a generic way (Li and Jagadish, 2014). The resulting SQL statements include
query nesting, query joins, and query aggregation. A system has been developed based on the proposed
approach named as NaLIR (Natural Language Interface for Relational databases), which incorporate these
characteristics. The system reuses previous SQL statements from the query log to save query computation
time.
TiQi, a natural language interface, allows to pose speech and text based queries in natural language (Lin,
2015). It is a web based tool and especially designed to access project’s data. TiQi accepts user query and
generates Traceability Information Model (TIM) which displays underlying object classes and attributes.
TIM is stored in a centralized location to map unique nodes to access and specify data demanded by input
query. In order to produce an up-to-date SQL output, H2, the JAVA SQL database, has been designed. This
database engine provides support for data sources ranging from Jira to Excel Spreadsheet.
Palakurthi et al. presented a framework which classifies explicitly defined attributes present in a natural
language query to convert them into various SQL clauses (Palakurthi, 2015). A statistical classifier CRF
(Conditional Random Field) implemented to classify these attributes. The system has been tested on three
domains (Academic, Restaurant, and Geo-query) and it has achieved accuracy of 70% and an F-measure of
Page 6
85%.
Sujatha et al. has developed a system using the EFFCN algorithm, which used both semantic and syntactic
knowledge to build an accurate match of input query to corresponding SQL query (Sujatha and Raju, 2016).
It was tested on CPVbase with the precision and recall ranging 84%. Ontology merging and enhanced
parsing process helps the system to prune the query for the sake of desired information. The authors
suggested that future growth of NLIDB systems will be achieved via neural networks, machine learning
and statistical parsing techniques, tackling abbreviated queries and dealing with temporal logic based
complex natural language queries.
Mvumbi et al. proposed a system “NALI” to translate NL queries to SQL queries. It has been especially
designed to address the portability issue of NLIDB (Natural Language Interface to Database) from one
domain to another without customizing the tool manually and automatically generating the configuration
model for the new domain (Mvumbi, 2016). The proposed approach reduced the manual workload to
customized NLIDB. They introduced two authoring schemes (Top-down and Bottom-up) for customization
in order to evaluate the best. Top-down approach, pre-harvests key lexical terms by using un-annotated
sample NL queries. Furthermore, it includes semantics for negative form of verbs, comparative and
superlative form of adjectives to reduce configuration workload. While the Bottom-up approach utilized
database schema and data dictionary to generate configuration model automatically. The proposed system
has been tested on Geoquery corpus, and revealed that top-down authoring approach results are much better
compared to bottom-up for customizing a NLIDB system.
Sukthankar, et al. has presented a system to deal with simple as well as complex queries (Sukthankar, et al
2017). The proposed work focused on aggregate function, WHERE clause conditions and advanced clauses
such as ‘Having’ and ‘Order by’. The proposed system works well for single input query. The authors
suggested to enhance the system by accepting multiple sentence queries and translate them into one
resulting SQL query.
Seq2SQL framework inefficiencies (generalized to unseen schema and serializbility) and its Seq-to-Seq
model has been improved with the help of new approach i.e. sequence-to-set based model proposed by
Xiaojun (Xu,2017). The model has been implemented in “SQLnet” tool using Seq2SQL as baseline
framework but eliminates reinforcement learning. Similar to SQLizer, a sketch based scheme has been
implemented to parse the NL query, but each sketch has a dependency graph to predict the new sketch via
using previous prediction of sketch. This new model improves the Seq2SQL results from 9% to 13% on
various metrics.
Table 1 shows feature-based comparison of frameworks using statistical approach.
Table 1 Features Comparison of SQL-based Frameworks Using Statistical Approach
System
Name
Language
Support
Heuristic
Rule
Support
Interoperabil
ity
Usability
Reported
Correct
ed
Report
ed
Support
Complex
Queryin
g
Performance Evaluation
Wolfram
Alpha
2009
English to Wolfram
Query
Similar to SQL
Yes No Good Good Yes Symbolic Computation, Knowledge Base, Ontology
Keyword+
+, 2010
English to
SQL
Yes No Good Good Yes Tested on 500 web queries
and achieved >80% precision and recall
RNLQ-
SQL, 2014
ClinDal
Queries to
SQL
Yes Yes Good Good Yes Tested on RPAH-ICU
Corpus, Accuracy 84%
Page 7
NaLIR,
2014
English to
SQL
Yes Yes Good Good Yes Microsoft Academia
Search(MAS) dataset, Good Recall and Precision
TiQi,
2015
NL Trace
Query to SQL
Yes Yes Good Good Yes Tested on Isolate and Easy
Clinic datasets, Accuracy of supported and unsupported
queries are respectively
92.6 %, 82.9 %
Ashish,
2015
English to
SQL
No Yes Fair Fair No Statistical classifier CRF
trained on manually pre- pared data, and compare
to Academic, Restaurant,
Geo query, 70 %
EFFCN,
2016
English to
SQL
No No Good Good Yes Tested on CPVBase,
Precision & Recall 84%
NALI,
2016
English to
SQL
Yes Yes Good Good Yes Tested on Geo query
corpus, Customization, auto
generation of configuration model
nQuery,
2017
English to
SQL
Yes No Fair Fair Yes Intelligent Table & attribute
mapping, clause tagging aggregate function, group
by and having clause
SQLnet,
2017
English to SQL
No Yes Good Good Yes Improve Seq2SQL results from 9% to 13% on various
metrics
SQLizer,
2017
English to SQL
Yes No Good Good Yes Tested on MAS, IMDB and YELP databases, Accuracy
90%
2.1.1.2 SQL Based Frameworks using Symbolic Approach
Alessendra and Alessendro (2012) proposed a framework and tested with 800 questing datasets about geo
queries. They combined the through rules with weighing scheme that provides a ranking list of all selected
candidates queries in SQL. Natural Language web Interface for Database (NLWIDB) is another commercial
framework available to explore different databases (Rukshan et al., 2013).
A hybrid approach has been proposed to build a framework “NLKBIDB” using the methodology of NLIDB
(Natural Language Interface to Databases) and KBIDB (Keyword Based Interfaces to Databases) by
(Axita,2013). The author has explained various system agents that first accepts the NL query and passes it
to various analyzers such as lexical, syntax, and semantic. If the query syntax is valid, the analyzed input
further passed to the next agent is in the form of a tree structure. Tokens are mapped to generated knowledge
base, if tokens are found in the knowledge base then a pointer is sent to the SQL generator, otherwise, the
user is notified to reform query. This framework utilized the logical and conceptual schema of a database
as the knowledge base. It is derived from the metadata of the database and knowledge experts help to update
it. It has been tested on an agriculture survey database and reports 53 % accuracy against syntactically
incorrect queries. Another commercial framework, developed in 2014, NQL which is widely used in
organizations like university’s databases. The purpose of NQL is querying database in natural language
(Hessa and Emad, 2016). It parses English language queries and converts them into SQL. Furthermore,
NQL algorithm was used in its implementation.
In 2015, another commercial framework for natural language to SQL was developed named as TR Discover
(Dezhao et al., 2015). TR Discover is mainly use in domains like Life Sciences and law. It provides
suggestions for construction of questions that belongs to natural language. TR Discover inherits SPAROL
and SQL characteristics, that is the reason it uses feature based grammar and translation along with parsing.
Table 2 displays the feature-based comparison of frameworks using symbolic approach.
Page 8
Table 2 Features Comparison of SQL-based Frameworks Using Symbolic Approach
System
Name
Language
Support
Heuristic
Rule Support
Interoperability Usability
Reported
Corrected
Reported
Support
Complex
Querying
Performance
Evaluation
Alessandra,
2012
English to
SQL
Yes No Good Good Yes Tested on
GEOQuery
Corpus 800 questions,
Recall 88 %
Accuracy 81%
NLKBIDB,
2013
English to
SQL
Yes No Fair Fair No Tested on
Railway,
college domain dataset, Solved
53% of syntactically
incorrect
queries %
NLWIDB
2013
Language
Processin
g
Yes No Fair Good Yes Language
Processing
NQL
2014
NQL
algorithm
Yes No Good Fair Yes NQL algorithm
TR
Discover
2015
Feature based
Grammar.
Parsing, FOL
translation
Yes No Good Fair Yes Feature based Grammar.
Parsing, FOL
translation
2.1.1.3 SQL Based Frameworks using Connectionist Approach
Gulwani presented an auto synthesizing programming system “NLyze” to extract information from
spreadsheet data without interacting with spreadsheet programming (Gulwani ,2014). The proposed
system includes a Domain Specific Language (DSL) to deal with algebra of map, filter, and join etc.
Compositional and Typed nature of DSL effectively translates the NL query and provides appropriate
abstraction to the unskilled user. Translation of the NL query to spreadsheet programming is performed
by a dynamic programming based algorithm, which converts the NL query into a ranked set of likely
programs. The proposed algorithm combined two ideas- keyword programming and semantic parsing.
Keyword programming approach has high recall but low precision, while semantic parsing has low recall
and high precision. NLyze is specific to domain, purely based on typed synthesis, and targets only
spreadsheets. In contrast, SQLizer, a similar tool to NLyze, provides domain independency by auto
generating configuration model for new domain and target relational databases.
An intelligent agent based framework for databases to transform simple text to equivalent SQL query has
been developed and presented in (Hessa et al., 2016). The authors have focused on parsing of extracted
keywords via syntactic and semantic parser. Rules have also been designed for parser to learn the
knowledge hidden in the natural language query. They used tools including Sphinx for speech recognition,
MySQL as RDBMS, Stanford parts of speech (POS) tagger as syntax parser, Stanford named entity
recognizer (NER) as semantic parser and for parsing complex queries ClearNLP. These tools are
integrated in Intelligent Agent (IA) system which is implemented in Java. This system reported 80%
accuracy in their test suite.
Inherent ambiguity of the NL query bound the underlying synthesizer of NLIDB system to automatically
generate SQL representation. The scope of NLIDB is also limited by database agnostic (configuration
required for every new database). A novel technique has been introduced by Yaghmazadeh et al. to
Page 9
automatically synthesize SQL queries and auto-generate a configuration model for new databases
(Yaghmazadeh, 2017). Two ideas (typed-directed synthesis from NLP and repair technique from
programming language) have been merged in this technique. At first stage semantic parser used to translate
an NL query into a skeleton (a query sketch) which only represents the shape of query instead of full
content. It does not generate SQL query by training the NL query on a specific database during semantic
parsing. Initial sketch further needs to be refined because it does not capture the desired structure of input
query, therefore it has been repaired via fault localization and database of repair tactics. Skeleton contained
holes which are overcome by type-directed approach and converted to a complete SQL query and on each
completion a confidence score assigned to query on the base of schema and contents of database.
An intelligent user interface minimizes the communication gap between user and the system. Adding an
intelligent layer into the system eases the process of transforming the NL query to SQL statements. Singh
et al. presented an intelligent NLIDB system named as “NLTSQLC” User, it is purely based on metadata
and semantics sets for attributes and tables (Singh, 2016). This system takes input as the NL query, which
is further processed for lower case conversion, tokenization, escape word removal, and part of speech
tagging. System further classifies tagged tokens into relation, attributes and clauses. Finally, the system
removes ambiguous attributes with the same name to generate the final SQL representation. Another
framework, NLP Interchange Format (NIF) presented by Sebastian et al., (2012) works in the domain of
semantic web. One of the major advantage of using NIF is, it provides interoperability that is actually global
between different NLP tools. Kueri 2013, constructs query in faster manner. It facilitates user on a single
click by getting their queries by simply typing search box (Yossi, 2016).
Zhong et al. proposed “Seq2SQL”, a neural network-based framework for making an interpretation of
natural language queries to relating SQL (Structured Query Language) representation (Zhong, Xiong, and
Socher 2017). The proposed system reduces resulted query space and improves the execution accuracy of
a system. This framework has utilized Reinforcement Learning (RL) rewards and cross entropy loss
iteratively on query execution over the database to take in an approach to create unordered parts of the
question, which are less reasonable for advancement by means of cross entropy misfortune. They released
WikiSQL, a dataset of 87673 hand-explained cases of inquiries and SQL questions distributed over 26521
tables from Wikipedia. By applying strategy-based reinforcement learning (RL) with an inquiry execution
condition to WikiSQL, Seq2SQL beats the best in class semantic parser by Dong and Lapata (2016).
Utilizing the structure of SQL queries allows Seq2SQL to further reduce the output space of SQL queries,
which leads to higher performance than Seq2Seq and the pointer model. Limiting the output space leads to
more accurate conditions. Augmented pointer model generates higher quality WHERE clause conditions.
Incorporating structure reduces invalid queries from 7.9% to 4.8%. Arimo was founded in 2012 with the
purpose to support business intelligence and data science domain. Its user interface is inclined towards
natural language processing that learns behavior patterns from big data. Similarly, Quepy 2012, allowed
implementation of NLIDB systems using python language. It deals with complex queries and generate
accuracy representation of data because it is based on NLTK framework (Jonas, 2017). In 2017, a natural
language domain framework named as in2SQL was developed to deal with any natural language (Jeremy
and Shashank, 2017). Similarly, Easy Query Building (EQB) 2017 is freely available that allows user
queries in friendly way. All user requests can be described visually in natural language (EasyQuery, 2017).
To address the shortcomings of the NaLIR systems, such as tackling paraphrasing and various linguistic
variations, another framework namely “DBPal” was presented in (Basik, 2018). To translate queries, it uses
a novel translation model along with a feedback-based learning and auto-completion model to assist users
in paraphrasing the partial query while formulating the database query. This system has shown significant
accuracy improvement to build the complex queries. There are many commercial frameworks that uses
connectionist approach to develop. Some of them are listed here. In 2011, The Needle framework was
introduced. It was developed as website API for ecommerce websites and largely facilitates in shallow text
analysis. It is also recognized as one of the fast search ecommerce API. Thoughtspot 2012, is available for
data science and business domains. It has different significant features including ease of use, fast execution,
Page 10
minimal backlog maintenance and others (Ryan, 2018).
Table 3 has summarized all the features comparison of connectionist approach used in different
frameworks.
Table 3 Features Comparison of SQL-based Frameworks Using Connectionist Approach
System Name Language
Support Heuristic
Rule
Support
Interoperability Usability
Reported Corrected
Reported Support
Complex
Querying
Performance
Evaluation
The
NEEDLE
2011
English to filter
for SQL
Yes Yes Good Good Yes API for shallow text analysis,
Fast search
Ecommerce API,
Thoughtspot
2012
English to
SQL
Yes Yes Good Good Yes Easy to Use,
Faster execution,
Reduce backlog, maintenance,
Zero
optimization to tune performance
Arimo
2012
English to
SQL
Yes Yes Fair Good Yes learn behavior
patterns from Big Data
Quepy
2012
English to
SQL, MQL,
SPARQL
Yes Yes Good Fair Yes deal with
complex queries and
generate accurate
representation, open source,
better for domain
specific database
NLP
Interchange
Format (NIF)
2012
NLP tools
out-
put to RDF
Yes Yes Fair Good Yes Global
interoperability
between different Natural language
processing tools
Kueri
2013
English to
SQL,
and JSON for
NoSQL
Yes Yes Good Fair Yes Faster
construction of
query, efficient
spelling checker, efficient data
ambiguity
handler, auto- completion of
query
NLyze, 2014 English to SQL
Yes Yes Good Good Yes Test on four different
spreadsheet and
achieved 98.2% accuracy and
precision
INLIDB,
2016
English to SQL
Yes No Good Good Yes Accuracy 80 %
NLQTSQLC,
2016
English to SQL
Yes No Good Good No Tested on Manually
prepared
Dataset of questions,
Accuracy 86%,
Recall 80%, Precision 89%
Seq2SQL,
2017
English to
SQL
No No Fair Fair Yes Tested on
WikiSQL Dataset,
Execution
Accuracy 59.4% and
Page 11
Logical form
accuracy 48.3% %
ln2sql
2017
SQL No No Fair Fair No It can deal with any natural
language
Easy Query
Builder
(EQB)
2017
English to SQL
No No Good Fair No Visual representation of
output
DBpal,
2018
English to
SQL
No No Good Fair No Tested on
geographical
data sets of United States
2.1.2 Knowledge Based/ External Resources
In recent studies knowledge based approaches have been proposed to automate the ontology construction.
A study presented by Harris et al. (2015) combines NLP and knowledge base for raw text ontologies. The
approach was based on predefined dictionary of disorder type concepts that are expected to occur in the
text. The drawback of this approach was increased labor cost for dictionary construction, the dependency
of domain and limitations of patterns. Another work, presented by Cahyani et al. (2017) also focused on
utilizing knowledge based on controlled vocabulary and data linked with corpus. Text2Onto tool was also
used in this study for filtration on dictionary methods. For understanding of final concepts and candidate’s
relations, the work was also linked with pattern mapping of data. The limitation of this approach was it
requires predefined relations to the domain and it semantic meaning is also not considered. Qawasmeh et
al. (2018) proposed a work containing bootstrapping which was semi-automated involving preprocessing
of manual text with extraction of concept. But the drawback involved in this approach was domain
dependency on experts and involvement of labor in the process of development. Bhatia et al. (2018) another
researcher focused on automating ontology generation of web pages that are retrieved. An et al. (2018)
developed another approach that helped to transform database schema into automatically generated
ontology. This was done with the help of crafted rules but like other approached, this approach has some
serious problems. It required a predefined databased schema.
2.2 Translating Natural Language to Non Structured Query Language (NoSQL) Frameworks
In section 2.2 Natural Language to NoSQL frameworks have been presented based on two different
matching approaches namely pattern matching and semantic matching, in context of rule based and syntax
analysis, and syntactic pattern.
2.2.1 Rule Based & Syntax Analysis
Rule based and syntax analysis is a manual approach to set of rules that are formed for the representation
of knowledge. This representation involves the decision to conclude various scenarios. A study presented
by Abacha et al. (2011) shows that medical entities and the relationship of medical text and rule based
syntactic pattern are basically semi automatically built according to the criteria of semantics from a corpus.
Similarly, Ono et al. (2001) defined an approach for extraction of protein interaction information from the
literature presented in dictionary named rule-based and syntax based analysis. In the said approach protein
to protein interaction was presented.
2.2.1.1 NoSQL frameworks Using Semantic Matching
Semantic web contained a large number of linked open data repositories. Due to the complex nature of
SPARQL queries, it is difficult to formulate these by a naive user or even by an expert. Bretonnel et al.
developed a prototype system Linked Open Data Question Answering (LODQA) to transform plain text
Page 12
queries into corresponding SPARQL statements (Kim and Cohen 2013). Techniques used to implement the
prototype version included parsed via Enju, pattern based matching using chunks of base noun, targeting
performed via pattern matching, shortest path find via Dijkstras algorithm, ontology searching performed
via OntoFinder, and to determine and selection of a predicate as default. Some modules of the proposed
system do not perform as desired, so these are configured, and an integration testing is performed for all
modules in future.
To explore data from these domains, Karim has developed an efficient tool “Sem-QAS” (Karim et al. 2013).
The main function of this tool is to convert natural language questioning into corresponding SPARQL query
through identification of unique atomic constraint and their relation present in the input question. This tool
generates and combines triple patterns to output complex SPARQL queries for atomic constraint. Recall
and precision of the system mainly measured for association operators and scope modifiers processing. It
is tested on Mooney Job corpus for correctness and efficiency (Karim et al. 2013).
To convert natural language queries to non-SQL database, there is a need to develop systems that can
interpret non-English languages. In this regard, a system for Arabic language named as “AR2SPARQL”
has been developed by (Al Agha and Abu-Taha, 2015) to enable non-technical users to query RDF graphs.
Query ambiguity is resolved via linguistic as well as semantic approach. System has been tested on two
corpora and showed good performance statistics for precision and recall. Another article which uses Arabic
language as a case study for Natural Language Interface for Relational Database is presented in (Hammo,
Abu-Salem and Lytinen, 2002).
Exploring graph databases via natural language query also has strong potential. There has been a little
amount of inter pattern technique and translating them into corresponding SPARQL query. A framework
for real life application related to organic farming will be the target in the upcoming work. A large amount
of work has been carried on querying ontologies and RDF data via English language queries. A natural
language to SPARQL querying framework has been introduced by (Sæbu, 2015). The proposed system
does not demand background knowledge to build a query. The C-system analyzes an input information
request and generates a SPARQL query against it to explore required information from databases. Job
searching is a common task for unemployed persons. Data available on job search domains is annotated
semantically.
Information requested by user in his own words can be expressed without confining to point, click, scroll
or search to choose correct class and features. To ease the task of GraphAware NLP is one of those available
tools, which have been presented by (albertodelazzari et al. 2014). Developed as a plugin for Neo4j graph
database, GraphAware NLP provides a group of tools in form of procedures, APIs and background process.
Accurately converting a plain text question into corresponding database statement is the main goal of
NLIDB domain. There has been an effort to develop a system named as MANTRA QA by (Oro and Ruffolo,
2015). It transforms a plain text query into SPARQL and Cypher statement. It is a mixture of grammar and
logic-based concepts to accurately find out the concepts and relations in specific knowledge domains.
Primarily, it has been tested on tourism and finance domains benchmarks.
Table 4 shows NoSQL and graph databases frameworks using semantic matching.
Table 4 Features Comparison of NoSQL-based Frameworks Using Semantic Matching
System Name Language
Support
Datasets Supported Domain
Supported
Performance Evaluation
LODQA,
2013
English to
SPARQL
SNOMED CT Life Sciences Not Available
SemQAS,
2013
English to
SPARQL
Mooney Job Job Search Precision 100% and recall 99 %
Page 13
AR2SPARQL,
2015
Arabic to
SPARQL
OWL ontology
based on United
State Geography data
Question
answering
system for Quran
Test on geography data set
and achieved precision 88%, Recall 61% , F-Measure 0.72 while for disease data set
82%, 62% and 0.71
OptiqueNLQF,
2015
English to SPARQL
NPD Ontology Petroleum
companies
Not Available
MANTRA
QA, 2015
English to SPARQL &
Cypher
Manually Prepared questions
Tourism Not Available
2.2.1.2 NoSQL frameworks Using Pattern Matching
Semantic Web Interface Using Pattern (SWIP) has been presented by Camille et al. (2013). In this
framework French ecology and agriculture is mainly focused. Furthermore, SWIP uses English language
to SPAROL and dataset named QALD-3 supports this framework.
Table 5 represents NoSQL and graph databases frameworks using pattern matching.
Table 5 Features Comparison of NoSQL-based Frameworks Using Pattern Matching
System Name Language
Support Datasets Supported Domain
Supported Performance Evaluation
SWIP, 2013 English to
SPARQL
QALD-3 French ecology and
agriculture
Tested on Music brains dataset achieve 51% precision, recall, and
F-measure
2.2.2 Syntactic Pattern
Syntactic pattern is a well-known approach in the area of natural language processing especially ontology
engineering and extraction from data (Maynerd et al., 2009). Unlike other approaches, this approach
consists of huge amount of crafted syntactic patterns therefore, it has high recall and low precision. And
this makes it domain dependent also (Reiss et al., 2008).
Hearst in 1992 extracted hyponymy lexical relations from text based hand written patterns and parts-of-
speech were used as tags in it. In another study by Downey et al. (2004), the approach was used to learn the
patterns from text and extract information from them. Text2Onto as previously discusses, combines other
approaches like machine learning with basic linguistic approaches and used in POS tagging. The drawback
of syntactic pattern is the limit number of available patterns and dependency of domain. There are also
some major issues of scalability, domain knowledge and labor.
Tiddi et al. (2012) presented an approach. The purpose of the approach was to generate web content via
syntactic patterns for the relations that exist in linked open data. Furthermore, it was used to extract new
entities from web and ontology construction. It has also some limitations like its limit to RDF scheme only
and need of domain of interest for input.
3. Discussion
The aim of this survey is to explore the state of the art natural language querying frameworks for databases.
These natural languages to database querying frameworks have a four-decade long history and several
efforts have been made to facilitate end users. Initial tools were designed to deal with databases on the
small scale and those systems were not scalable for commercial use. With advancement in technology, new
Page 14
commercial natural language to database querying frameworks were developed supporting multiple
databases. In the literature, related studies and surveys have been found which focus particularly on the
natural language to database querying frameworks and most of the available surveys are outdated.
We have included frameworks developed around 2008 to 2018 and found 47 frameworks related to our
topic. Out of 47, we have selected 35 frameworks closely relevant to our targeted topic for survey. A feature
comparison table has been compiled for those tools which are available for commercial purposes. Initial
frameworks which were developed for translation do not fulfill the definition of current databases due to
the heterogeneous nature of data, and these tools were only designed to deal with small scale databases.
These frameworks have been categorized according to structured and unstructured databases. Multiple
query languages are used to retrieve information from databases, but we focused only on the popular query
languages of today. Furthermore, we have presented reviews of frameworks which translate natural
language query to SQL, MQL, SPARQL, RDF, CYPHER, and GREMLIN for both structured and
unstructured databases.
Lastly, we have found that 70% of the work in the natural language to database querying is carried out for
SQL. The share for NoSQL languages such as for SPARQL, CYPHER, and GREMLIN are 15%, 10% and
5% respectively. With the increasing popularity of NoSQL especially graph databases, we urge researchers
to focus on developing new natural language to database frameworks for CYPHER and other graph
languages. It has also been observed that most of the available natural language to database querying
frameworks support English language only. Few efforts have been reported where researchers have worked
on Portuguese and French. Multi-language support or dedicated systems in international languages are
desirable to make the overall data-driven process easy.
References
AlAgha, I., & Abu-Taha, A. (2015). AR2SPARQL: an arabic natural language interface for the semantic
web. International Journal of Computer Applications, 125(6).
Alawwad, H., & Khan, E (2016). An Intelligent Database System using Natural Language Processing.
International Journal of Computers, https://www.iaras.org/iaras/journals/ijc
Albertodelazzari, Bachmanm, Ikwattro, and Inserpio. (2014) "GraphAware Knowledge Platform." GitHub.
https://graphaware.com/.
Androutsopoulos, I., Ritchie, G. D., & Thanisch, P. (1995). Natural language interfaces to databases–an
introduction. Natural language engineering, 1(1), 29-81.
Alessendra Giordani, Alessendro Moschitti. (2012). “Translating Questions to SQL Queries with
Generative Parsers Discriminatively Reranked”, Proceedings of the Coling, pp. 401-410
Ben Abacha, P. Zweigenbaum, (2011). Automatic extraction of semantic relations between medical
entities: A rule based approach, J. Biomed. Semantics. doi:10.1186/2041-1480-2-S5-S4.
Coden, G. Savova, J. Buntrock, I. Sominsky, P. Ogren, C. Chute, P. de Groen, . (2007). Text Analysis
Integration into a Medical Information Retrieval System: Challenges Related to Word Sense
Disambiguation, Medinfo 2007 Proc. 12th World Congr. Heal. Informatics; Build. Sustain. Heal. Syst
Basik, F., Hättasch, B., Ilkhechi, A., Usta, A., Ramaswamy, S., Utama, P., & Cetintemel, U. (2018, May).
DBPal: A Learned NL-Interface for Databases. In Proceedings of the 2018 International Conference on
Management of Data (pp. 1765-1768). ACM.
Bercich, N. H. (2003). The Evolution of the Computerized Database. arXiv preprint cs/0305038.
Page 15
Christian B., David A., Yingjie Y., (2010). “Natural Language Processing: A Prolog Perspective”. Artificial
Intelligence Review, Springer, vol. 33, pp. 151-173
Cahyani, Denis Eka, and Ito Wasito. (2015) "Automatic Ontology Construction Using Text Corpora and
Ontology Design Patterns (ODPs) in Alzheimer’s Disease." Jurnal Ilmu Komputer dan Informasi 10, no. 2:
59-66.
Dezhao Song, Frank Schilder, et al., (2015). “TR Discover: A Natural Langugae Interface for Querying and
Analyzing Interlinked Datasets”, Springer International Publishing, LNCS 9376, pp. 21-37, DOI
10.1007/978-3-319-25010-6 2
D. Maynard, A. Funk, W. Peters, (2009) Using lexico-syntactic ontology design patterns for ontology
creation and population, in: CEUR Workshop Proc.
D. Downey, O. Etzioni, S. Soderland, D.S. Weld, (2014) Learning Text Patterns for Web Information
Extraction and Assessment, Artif. Intell.
Doing-Harris, Kristina, Yarden Livnat, and Stephane Meystre. (2015). "Automated concept and
relationship extraction for the semi-automated ontology management (SEAM) system." Journal of
biomedical semantics 6, no. 1:15.
Elías Andrawos, García Gonzalo Berrotarán, and Rafael Carrascosa. (2013, September). "Quepy" A Python
Framework to Transform Natural Language Questions to Queries. http://quepy.machinalis.com/.
EasyQuery. (2017). “Easy Query Builder: SQL Query Builder that doesn’t make you Learn SQL”, web
accessed easyquerybuilder.com
Ferré, S. (2012, August). Squall: A controlled natural language for querying and updating rdf graphs.
In International Workshop on Controlled Natural Language (pp. 11-25). Springer, Berlin, Heidelberg.
Fluree. (2017). "Blockchain, Meet Database." Fluree Blockchain Database and Decentralized Apps.
https://flur.ee/.
Frank Z., Erik C., Roy E., (2018). “Natural Language Based Financial Forcasting: A Survey”. Artificial
Intelligence Review, Springer, Vol. 50, pp. 49-73
F. Reiss, S. Raghavan, R. Krishnamurthy, H. Zim, S. Vaithyanathan, (2008) An algebraic approach to rule-
based information extraction, in: Proc. - Int. Conf. Data Eng., oi:10.1109/ICDE.2008.4497502.
Ganti, V., He, Y., & Xin, D. (2010). Keyword++: A framework to improve keyword search over entity
databases. Proceedings of the VLDB Endowment, 3(1-2), 711-722.
Gulwani, S., & Marron, M. (2014, June). Nlyze: Interactive programming by natural language for
spreadsheet data analysis and manipulation. In Proceedings of the 2014 ACM SIGMOD international
conference on Management of data(pp. 803-814). ACM.
Hammo, B., Abu-Salem, H., & Lytinen, S. (2002, July). QARAB: A question answering system to support
the Arabic language. In Proceedings of the ACL-02 workshop on Computational approaches to semitic
languages (pp. 1-11). Association for Computational Linguistics.
Tiddi, N.B. Mustapha, Y. Vanrompay, M.-A. Aufaure, (2012) Ontology learning from open linked data and
web snippets, Confed. Int. Work. Move to Meaningful Internet Syst. OTM 2012 OTM Acad. Ind. Case
Stud. Program, EI2N 2012, INBAST 2012, META4eS 2012, OnToContent 2012, ORM 2012, SeDeS 2012,
Page 16
SINCOM 2012, SOMOCO 2012. doi:10.1007/978-3-642-33618-8_59.
Jan Steemann, Michael Hackstein Hackstein, and Max Neunhoffer. (2017, July) "Highly Available Multi-
model NoSQL Database." ArangoDB. https://www.arangodb.com/.
Jeremy Ferrero, Shashank Khare. (2017). “ln2sql”, web accessed https://pypi.org/project/ln2sql/
Jonas Chapuis. (2017). “Natural Language Interfaces to Databses (NLIDB)”, web accessed
https://www.nexthink.com/blog/natural-language-interfaces-to-databases-nlidb/
J. An, Y.B. Park, (2018) Methodology for Automatic Ontology Generation Using Database Schema
Information, Mob. Inf. Syst. (2018) 1–13. doi:10.1155/2018/1359174.
Karim, N., Latif, K., Ahmed, N., Fatima, M., & Mumtaz, A. (2013, September). Mapping natural language
questions to SPARQL queries for job search. In Semantic Computing (ICSC), 2013 IEEE Seventh
International Conference on (pp. 150-153). IEEE.
Kim, J. D., & Cohen, K. B. (2013). Natural language query processing for SPARQL generation: A
prototype system for SNOMED CT. In Proceedings of biolink (pp. 32-38).
Leavitt, N. (2010). Will NoSQL databases live up to their promise. Computer, 43(2).
Li, Fei, and H V Jagadish. 2014. “Constructing an interactive natural language interface for relational
databases.” Proceedings of the VLDB Endowment 8 (1): 73–84.
Li, X., & Boucher, M. (2013). Under the hood: The natural language interface of graph search. URL:
https://www. facebook. com/notes/facebookengineering/under-the-hood-the-natural-languageinterface-of-
graph-search/10151432733048920 [Online].
Lin, J., Liu, Y., Guo, J., Cleland-Huang, J., Goss, W., Liu, W., ... & Rasin, A. (2017, October). TiQi: A
natural language interface for querying software project data. In Automated Software Engineering (ASE),
2017 32nd IEEE/ACM International Conference on (pp. 973-977). IEEE.
Malik K., Hisham Kanaan, Vian Sabeeh, Ghaus Malik (2018). “Autonomous, Decentralized and Privacy-
Enabled Data Preparation for Evidence-based Medicine with Brain Aneurysm as a Phenotype”, IEICE
Transactions on Communications, E101-B(8),115-126.
Messina, A., Augello, A., Pilato, G., & Rizzo, R. (2017, July). BioGraphBot: A Conversational Assistant
for Bioinformatics Graph Databases. In International Conference on Innovative Mobile and Internet
Services in Ubiquitous Computing (pp. 135-146). Springer, Cham.
Mvumbi, T. (2016). Natural language interface to relational database: a simplified customization
approach (Doctoral dissertation, University of Cape Town).
M.A. Hearst, (1992) Automatic Acquisition of Hyponyms from Large Text Corpora, in: Proc. 14th Int.
Conf. Comput. Linguist., doi:10.1.1.36.701
M. Bundschus, M. Dejori, M. Stetter, V. Tresp, H.P. Kriegel, (2008). Extraction of semantic biomedical
relations from text using conditional random fields, BMC Bioinformatics. doi:10.1186/1471-2105-9-207
M. Poesio, E. Barbu, C. Giuliano, L. Romano, (2008). Supervised relation extraction for ontology learning
from text based on a cognitively plausible model of relations, ECAI 2008 3rd Work. Ontol. Learn. Popul.
Page 17
M. del Carmen Legaz-García, J.A. Miñarro-Giménez, M. Menárguez-Tortosa, J.T. Fernández-Breis,
(2016). Generation of open biomedical datasets through ontology-driven transformation and integration
processes, J. Biomed. Semantics. doi:10.1186/s13326-016-0075-z.
M. Missikoff, R. Navigli, P. Velardi, (2002). Integrated approach to Web ontology learning and
engineering, Computer (Long. Beach. Calif). doi:10.1109/MC.2002.1046976.
Manvi, K.K. Bhatia, A. Dixit, (2018) Automatic generation of ontology for extracting hidden web pages,
in: Adv. Intell. Syst. Comput.,. doi:10.1007/978-981-10-6620-7_14.
Nadkarni, P. M., Ohno-Machado, L., & Chapman, W. W. (2011). Natural language processing: an
introduction. Journal of the American Medical Informatics Association, 18(5), 544-551.
Neo Technology. (2007, April) "The Neo4j Graph Platform – The #1 Platform for Connected Data." Neo4j
Graph Database Platform. https://neo4j.com/.
Oro, E., & Ruffolo, M. (2015, November). A Natural Language Interface for Querying RDF and Graph
Databases. https://intranet.icar.cnr.it/wp-content/uploads/2016/11/RT-ICAR-CS-15-05.pdf
Ontotext. (2018, January)"GraphDB." GraphDB Downloads and Resources. http://graphdb.ontotext.com/.
OrientDB Ltd. (2010, April)"Graph Database | Multi-Model Database | OrientDB." Home · OrientDB
Manual. https://orientdb.com/.
Palakurthi, A., Ruthu, S. M., Akula, A., & Mamidi, R. (2015). Classification of attributes in a natural
language query into different sql clauses. In Proceedings of the International Conference Recent Advances
in Natural Language Processing (pp. 497-506).
Pradel, C., Haemmerlé, O., & Hernandez, N. (2013, October). Natural language query interpretation into
SPARQL using patterns. In Fourth International Workshop on Consuming Linked Data-COLD 2013 (pp.
pp-1).
Qawasmeh, Omar, Maxime Lefrançois, Antoine Zimmermann, and Pierre Maret. (2018) "Improved
Categorization of Computer-assisted Ontology Construction Systems: focus on Bootstrapping capabilities."
In Extended Semantic Web Conference (ESWC2018).
Reis, P., Matias, J., & Mamede, N. (1997). Edite-A Natural Language Interface to Databases A new
dimension for an old approach. In Information and Communication Technologies in Tourism 1997 (pp.
317-326). Springer, Vienna.
Rao, G., Agarwal, C., Chaudhry, S., Kulkarni, N., & Patil, D. S. (2010). Natural language query processing
using semantic grammar. International journal on computer science and engineering, 2(2), 219-223.
Ryan Mattison. (2018). “ThoughtSpot”, web accessed https://www.thoughtspot.com/thoughtspot-
announces-searchiq-new-voice-driven-analytics-enterprise
Rukshan Alexendar, Prashanthi Rukshan, Sinnathamby Mahesan. (2013). “Natural Language Web
Interface for Database (NLWIDB)”, Proceedings of the third International Symposium, IEEE
Sæbu, T. S. (2015). OptiqueNLQF: A natural language query formulation system based on Semantic
Technologies(Master's thesis).
Safari, L., & Patrick, J. D. (2014). Restricted natural language based querying of clinical databases. Journal
of biomedical Informatics, 52, 338-353.
Sherman, Monroe. (2012). Cypher. Retrieved from https://www.w3.org/wiki/Cypher
Page 18
Singh, G., & Solanki, A. (2016). An algorithm to transform natural language into sql queries for relational
databases. Selforganizology, 3(3), 100-116.
Sripad, Joshi, and Laxmaiah E. n.d. 2013. Survey Of Natural Language Interface To Databases.
Sujatha, B., & Raju, S. V. (2016). Natural Language Query Processing for Relational Database using
EFFCN Algorithm. International Journal of Computer Sciences and Engineering, 4, 49-53.
Sukthankar, N., Maharnawar, S., Deshmukh, P., Haribhakta, Y., & Kamble, V. (2017). nQuery-A Natural
Language Statement to SQL Query Generator. In Proceedings of ACL 2017, Student Research
Workshop (pp. 17-23).
Stefan W., Ellen R., Gabriele S., (1996). Connectionist, Statistical and Symbolic Approaches to Learning
for Natural Language Processing, Springer.
T. Ono, H. Hishigaki, A. Tanigami, T. Takagi, (2001), Automated extraction of information on protein-
protein interactions from the biological literature, Bioinformatics. doi:10.1093/bioinformatics/17.2.155.
Warren, D. H., & Pereira, F. C. (1982). An efficient easily adaptable system for interpreting natural
language queries. Computational Linguistics, 8(3-4), 110-122.
Woods, William A, Ronald M Kaplan, and Bonnie Nash-Webber. (1972) The lunar sciences natural
language information system. Bolt, Beranek and Newman, Incorporated.
Xu, X., Liu, C., & Song, D. (2017). Sqlnet: Generating structured queries from natural language without
reinforcement learning. arXiv preprint arXiv:1711.04436.
Yossi Shani, Tal Cohen, and Yossi Vainshtein. (2016) "Natural Language Interface for Databases."
KUERI.ME. 2016. http://kueri.me/.
Yaghmazadeh, N., Wang, Y., Dillig, I., & Dillig, T. (2017). Sqlizer: Query synthesis from natural
language. Proceedings of the ACM on Programming Languages, 1(OOPSLA), 63.
Zhong, V., Xiong, C., & Socher, R. (2017). Seq2SQL: Generating Structured Queries from Natural
Language using Reinforcement Learning. arXiv preprint arXiv:1709.00103.