ATLS Automatic Translation of First-order Predicate Logic to SQL Ana -Mar ia Ca rc al et e Sebastian Del ek ta Y ub in Ng Balarabe Og be ha Arj un Saraf Wil li am Smi th Supervisor: Dr Fariba Sadri Executive Summary ATLS (pronounced ‘atlas’) is a system for automated translation of first order predicate logic queries into Structured Query Language (SQL). The system consists of two components: the com- mand line translator and the web user interface. At Imperial College during the first year, logic is taught in the first term and databases are taught in the second term focu sing on the relat iona l mode l. Predi cate logic is closer to the relational model than SQL, so students that have no prior experience with SQL will be able to use ATLS. Also, people who prefer logic, such as academics will be able to use this system over direct SQL. In orde r to use the syst em, users are able to connect to a dat abase of the ir choi ce. Usi ng a simple mapping between predicate logic and SQL, they can construct predicate formulas, using the database schema displayed on the web interface, that they can translate and run on the connected database. SQL is the most widely used database querying language; however, queries can also be ex- pres sed in first- orde r (pre dicat e) logi c sometimes more succi nctly and intuitiv ely . In prac tice, most Relational Database Management Systems (RDBMSs) use some variant of the SQL standard which means that SQL quer ies written for a range of RDBMSs will differ subt ly . This is a good reason for reasoning about databases in predicate logic since it provides a means for users to reason about a database without subjecting themselves to the nuances of the specific SQL implementation that is in use in the RDBMS in quest ion. This allo ws users to express queri es in a very genera l manner only worrying about the correctness of the logical formula and not about the correctness of the SQL query with regards to a particular RDBMS. ATLS thus forms a bridge between these two paradigms by allowing users to convert their abstracted predicate formulas into runnable SQL queries.
31
Embed
Automatic Translation of First-order Predicate Logic to SQL
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
8/9/2019 Automatic Translation of First-order Predicate Logic to SQL
Predicate logic and databases are some of the fundamental topics in the study of computer science. Some ideas
about how the former can be used to discuss the latter are straightforward—predicate logic, with appropriately
defined predicates, allows us to express set membership, term equality, or truth/false values, all of which areimportant theoretical ideas underpinning relational databases.
The goal of the project was to implement a tool that allows a user to manage a database using the language of
predicate logic rather than SQL, the factual standard for relational database management systems. It might be
the case that a database user finds it more intuitive to express a database concept using the language of logic
rather than SQL; eg., if they are unfamiliar with joins, they may prefer to write queries such as
e ∈ employee ∧ e.name = n ∧ p ∈ project ∧ p.manager = e
rather than ... FROM employee JOIN project ON employee.id = project.manager.
Three main user groups that can benefit from the proposed tool have been identified:
• Academics, who may prefer to query a database using predicate logic simply be reason of their special-
ization; indeed, our supervisor, a logic lecturer, has suggested that she often forgets the exact SQL rules
to retrieve the information she needs.
• Students, who may want to verify their familiarity with either predicate logic or SQL, using the other as a
benchmark. It may also be that they are unfamiliar with SQL but want to use a relational database—this
would be the case for first-year students of Computing at Imperial College, who are familiarized with
predicate logic in their first term but do not see SQL until the second.
• Working professionals with knowledge of predicate logic, who for personal reasons prefer it over SQL
for database management.
1.2 Objectives
The objectives that we were looking to fulfil can be grouped under three main headings.
• Functionality. The tool should provide extensive support for database querying using correspondences
between predicate logic and SQL that are as intuitive as possible. The resulting code should be correct
and free of redundancies, and use as many features of SQL as necessary to make the translation natural.
• Portability. The tool should support users on different platforms or web browsers, and provide translation
that can be ran on various relational databases.
• Ease of use. The tool should be quick to start with, without the need for extensive explanations or going
through tutorials. If required, the installation should be quick and intuitive.
After considering some of the possible design choices (see Section 3), a prioritized list of features was devel-
oped:
1. Basic command-line compiler supporting conjunction, quantification, and predicates
Note that the definition of predicate-expr and var-list forces the arity of the predicate symbol to be at least 2— this requirement will resurface soon.
5
8/9/2019 Automatic Translation of First-order Predicate Logic to SQL
Two issues arise when shifting from a TRC query to logic. Firstly, a TRC input is composed of two different
expressions. One is the tuple variable T = {e.name, e.address} of attributes in the desired output, whichcorresponds to the SELECT operator in SQL. The other is F = “e ∈ employee . . .”, the condition formula,
corresponding to FROM and WHERE. To preserve the rules of predicate logic as far as we can, we would prefer
the user to worry about one kind of input only. Therefore, we have decided that the output tuple is automatically
instantiated to be the set of types of all free variables in the query, f v(F ). Note that this implies that the order
of attributes in the SELECT statement cannot be guaranteed. Also, as the SQL language does not allow empty
SELECT statements, we additionally require that f v(W ) = ∅.
This decision requires that the type of variables can be inferred from the query. In the example above, the
“inference” is by an explicit set operator in “e ∈ employee” (to find out the table from which a tuple comes)
and dot dereference, as in e.salary (so type of s is employee.salary). However, predicate logic has no well-
devised way of dealing with set inclusion or ordering. Hence we will need our own interpretation.
One way of resolving this introduces an explicit predicate _from_relation, which would accept a tuple
variable and relation name and check ownership, eg. _from_relation(e, employee). However, as we don’t
have a “dot dereference” operator in logic, to extract an attribute from a tuple e we will need another predicate,
say: _get(e, name, n) that unifies a fresh variable n with the attribute name from tuple e.
However, the constructed translation is due to work on datasets that already provide a convenient interpretation
of set membership: keys. Consider a relation r = (k, a1, a2, . . .) with a single-column primary key k and
non-key attributes ai, i = 1, 2, · · · . We express the fact that a variable ϕ is the value of attribute ai by writing:
r.ai(κ, ϕ), (4)
where κ is the key variable. Thus, if we want to instantiate s to be the salary attribute of a tuple whose key isκ, we write
employee.salary(κ, s).
6
8/9/2019 Automatic Translation of First-order Predicate Logic to SQL
By default, the first position in a predicate’s tuple argument is occupied by the key, hence κ is instantiated
automatically and can be used in any context where we need to refer to the tuple’s key. However, we still need
a way of instantiating κ when we don’t use any other attributes (such as salary). As a key is a perfectly valid
attribute, we are allowed to use expression (4), writing r.k(κ, κ), or employee.eid(κ, κ), where “eid” is the
name of the key in the table. This is an acceptable solution, but to eliminate the redundancy and the need to
type out the name of the key we permit shortening r.k(κ, κ) to just r(κ).1
The utility of this approach is more evident in the case of keys composed of multiple columns, k = (k1, . . . , kn).Request for an attribute is now of the form r.ai(κ1, . . . , κn, ϕ). Note that each of the n key variables κi is thus
instantiated. Using the knowledge of the database’s schema, we can decide whether the arity of the predicate is
correct—for a relation with an n-column primary key, the arity must be n + 1. As in the n = 1 case above, the
keys can be instantiated by themselves, without the need to refer to any other attribute. This is done either via
n expressions of the form r.ki(κ1, . . . , κn, κi) for all i, which treats κi as regular attributes, or more concisely
by r(κ1, . . . , κn).
We allow replacing any variable that is a predicate argument with a constant. Writing r.a(. . . , C , . . .), where C
is a key or attribute constant, is semantically equivalent to r.a(. . . , ϕ , . . .) ∧ ϕ = C , where ϕ is a fresh variable
not occurring anywhere else in the query with the restriction that ϕ is considered bound.
2.4 Quantification
2.4.1 Existential quantifier
Consider the case of an existential quantification W = ∃xP x at the topmost level (ie., not in a nested formula).
According to the rules borrowed from TRC, the query returns a set of all tuples (with attributes restricted to the
types of f v(W )) such that the query evaluates to true. It is obvious that the variable x is bound and x ∈ f v(W ),
so its type will not be returned.
To request the names of all employees, we might call:
∃x(employee.name(x, n)) (5)
Note f v(W ) = {n}, whose type is employee.name—this is the only attribute that can be returned. Consider
all possible substitutions of the variable n. If an employee John Doe exists in the relation, then
W [n → “John Doe”] = ∃x(employee.name(x, “John Doe”)) →
with the primary key of the tuple substituted for x. Hence the tuple containing John Doe, restricted to
πemployee.name , will be returned. A translation to SQL is
SELECT name
FROM employee
The role of ∃ in binding the variables is clear. However, it is partially redundant for just ensuring the existence
of a tuple. After all, the query
employee.name(x, n), (6)
with f v(W ) = {x, n}, also has a truth/falsity interpretation for a particular substitution. Eg.,
W [x → 16f5a3, n → “Jane Roe”] = employee.name(16f5a3, “Jane Roe”) →
if there is a tuple with primary key 16f5a3 and name attribute “Jane Roe”, and evaluates to ⊥ otherwise.
The only difference between (5) and (6) is the variable binding—the latter has the key variable x as free and
therefore will be translated as
1This demonstrates the requirement that the query for a database can only exists in a model where the schema of every relation
mentioned in the query is given—otherwise the key name would remain unknown.
7
8/9/2019 Automatic Translation of First-order Predicate Logic to SQL
Call a domain of a TRC expression the set of all values that appear in the expression or in any tuple of any
relation occuring in the expression [5]. An expression is safe if its result can only come from its own domain.
The query (9) is therefore unsafe, because—as any query involving outermost negation—it can be satisfied by
an infinitely large answer set. Using non-negated universal quantification on the outermost level, W = ∀xP x,
also produces outermost negation, the result being the complement ◦C of the result set of (9). Therefore we
consider any query involving outermost universal quantification to be ill-defined .
However, there is nothing in the way of using universal quantification in nested formulas. Consider the query:find names of employees whose all projects were within the budget of £ 100,000. This might be expressed as:
∃x(employee.name(x, n) ∧ ∀y(project.manager(y, x) ∧ project.cost(y, c) ∧ c ≤ 100000)) (10)
which is equivalent to
∃x(employee.name(x, n) ∧ ¬ ∃y¬(project.manager(y, x) ∧ project.cost(y, c) ∧ c ≤ 100000)
(†)
)
It is not immediately clear how to treat the part indicated by (†). This is discussed in the next section on
negation.
2.5 Negation
Consider the (†) expression from the previous section:
∃y¬(project.manager(y, x) ∧ project.cost(y, c) ∧ c ≤ 100000) (11)
A straightforward reasoning might suggest that at least one of the three predicates project.manager (y, x),
project.cost(y, c), c ≤ 100000 has to be false to falsify the whole conjunction. However, it is unclear how
to represent in SQL the fact that ¬project.manager(y, x) while keeping it true that project.cost(y, c) and stay-
ing true to the desired semantics. We interpret query (11) as saying that there exists a project whose cost isgreater than £ 100,000. Indeed, we only really wish to negate the predicate c ≤ 100000—not the other two,
which might impact the interpretation of variables, especially if they occur in other predicates in the outer
scope! This is equivalent to saying that the set containing all projects except those with cost less or equal to
£ 100,000 is not empty. We thus have a way of finding a condition for negating the parenthesized expression in
(11) without breaking unification. This is achieved with SQL’s EXCEPT operator:
SELECT project.pid
FROM project
EXCEPT SELECT project.pid
FROM project
WHERE project.cost <= 100000
will select projects with cost greater than £ 100,000. We can revisit query (10) now and apply this idea to arrive
at:
SELECT employee1.name
FROM employee AS employee1
WHERE (NOT ((EXISTS
(SELECT project1.pid,
project1.cost
FROM project AS project1
WHERE project1.manager = employee1.eid
EXCEPT SELECT project1.pid,
project1.cost
FROM project AS project1 WHERE project1.cost <= 100000
AND project1.manager = employee1.eid))))
9
8/9/2019 Automatic Translation of First-order Predicate Logic to SQL
which returns all employees from either Computing or Mathematics and translates to
SELECT employee.eid
FROM employee
WHERE (employee.department = ’Computing’)
OR (employee.department = ’Mathematics’)
3 Design and Implementation
3.1 Choice of tools
The initial meeting with the supervisor helped us clarify the goals and settle on some of the questions crucial
to the success of the project. The developed tool was meant to support a user’s experience in database querying
and management—as there exists a plethora of tools and standards of doing this, the scope of the project
was possibly very large. The tool could be, for example, a wrapper around a command-line interface such as
psql used to query a database, or a plugin for available database management tools, such as MySQL or Toad.
However, as portability was one of our objectives, we did not want to commit to supporting one particular
standard or software piece. Hence we decided to develop a standalone tool.
The next decision was that between a native desktop application or a web interface. Desktop software would
be more appropriate for users who do not want to rely on an Internet connection for proper functionality. In the
present day, this is becoming an increasingly less valid concern. Also, none of us had extensive prior experience
with writing desktop applications, and it would likely consume some time to produce even a rudimentary
version. Conversely, we had prior experience with web applications, which allow excellent portability and easy
continuous deployment of revisions. Hence we decided on the latter.
The next step was deciding on the main implementation language. After a brief discussion, where the require-
ment to build a compiler-like tool emerged, we have narrowed down the choices to three: Haskell, Python,
and Java. All of us had prior experience with Java, however we did not know how it would support a web
application and also were not satisfied with development speed it provides. Haskell had parser and compiler
tools available, but our experience was limited to toy programs and we were not sure how that would impact
the velocity. Even though only one member of the team had prior experience with Python, it was generallyregarded as the easiest general purpose language to pick up, and provides excellent development speed and
support for web programming. We decided to use it for our tool, also as a consequence of learning interests of
team members.
12
8/9/2019 Automatic Translation of First-order Predicate Logic to SQL
Broadly, the project can be split into a user interface and a translator. The core of the tool, the translator, was
reduced to the problem of creating a compiler that takes predicate logic formulas as input and returns SQL
queries as output. Naturally, this task was split into creating a scanner, a parser and the actual translator. This
was different from creating a standard compiler due to the fact that semantic analysis of the formula required a
connection to the user’s database in order to check the correctness of the relations, keys and other details. To
make the tool usable, we decided on creating a user-friendly web interface that allows users to do the following:
• Connect to a database with the relevant details.
• View the database schema.
• Input predicate logic formulas (that can be built from the database schema).
• View the result of translating the query.
• View the result of running the translated query on the connected database.
3.3 Front-end
The user interface for this project is a key element. It provides access to the translator in a more use friendly
manor than a command line interface. To make it useable to as many people as possible (and with minimum
effort in both development and usage) the user interface is a JavaScript web application.
3.3.1 Design
The first step in designing the user interface was to create a list of functional requirements, features that are
necessary for the interface. The initial list was created by discussing with our supervisor what was required.
The list of requirements evolved over time as designs materialised and usability was evaluated.
The first mocks of the layout of the user interface were done on paper and whiteboards to allow us to quickly
discuss and change it. These designs were then translated into basic HTML and CSS layouts that showed
potential problems that were visible on paper, such as too much negative space or the page being too crowded.The first implemented design2 was a very simple layout, consisting of two editors, a results panel, and a toolbar.
When a user accesses the interface, she will be prompted to enter the database connection settings. Once a
connection has been successfully established, she can enter the logic query into the top editor. Logical symbols
can be enter by hitting the buttons displayed at the top of the window. To facilitate editing queries, the symbols
are also found under the expected key shortcuts, eg. Ctrl+Shift+6 will enter the symbol ∧ (with the lookalike
ASCII caret ˆ found under Shift+6). Two processing options are provided: SQL will translate the logical query
into SQL and display it in the bottom text window, and RUN will additionaly run the query on the database and
display the results in the result tab to the right of the editors.
Figure 2: Initial design of the user interface
The colours were picked to match the editors default theme. The next design3 had more space to make give
better boundaries between the individual areas of the program. It also had adjustable sizes and a tab control tohide some of the content when it was not needed. As noted in a meeting with our supervisor, the user needs the
ability to continuously refer to the schema of the database—which we somehow missed in the initial design.
Upon connecting to the database, the user now sees the schema, complete with the keys indicated and data
The final design4 is fully customizable, allowing the user to modify the layout according to their preferences.
This was the best way of laying out the application because everyone had his or her own opinion of what is the
correct layout. It was inspired in parts by JetBrains IntelliJ IDEA5 and Microsoft Visual Studio6, two popular
integrated development environments. We kept all the functionality of the previous iteration, adding some minor
improvements to the user experience, such as syntax highlighting or tracking the opening parenthesis when
constructing a query. Clicking on an attribute name in the schema will place it in the editor, and autocompletion
suggests the full names of tables, attributes, or previously used variables.
Figure 4: Final iteration of the user interface
3.3.2 Implementation
One of the first decisions made about the user interface was that it would be a web application. This meant
that if any real time user interaction was wanted then JavaScript would have to be involved, even if indirectly.
JavaScript is used to provide AJAX
7
, a method of loading data asynchronously after the page has loaded, andthe rich syntax highlighted text editors. The JavaScript could have been abstracted to some other language such
as Dart8 or CoffeeScript9, both of which compile to JavaScript, however the benefit of them was negated by the
simplicity of the application, the framework being used and an ES6 transpiler10.
The application was developed using CommonJS11 as a method of splitting the code into separate modules.
This made encapsulation easier because modules could hide internal representations and just provide a public
interface. CommonJS was designed for server side JavaScript so a tool called browserify12 was used to compile
all the modules into a single file that could be used by browsers.
The framework used for the application was AngularJS.13There are other similar frameworks such as Em-
ber.js14 and Backbone.js15 available but members of the group already had experience using angular giving
advantages in terms of speed of development and avoiding pitfalls. AngularJS provides a modular model-view-controller architecture that allows easy testing and strong bindings to HTML. Components in angular are called
The main controller links together shared data (such as the database settings and error list) and events from the
components. This allowed all the components to communicate without being directly coupled to each other. All
AJAX requests were handled through the ”Api” service that wrapped the ”$http” service provided by angular.
The first component is the database settings panel; it collects the settings from the user and allows them to test
them. The settings and the validity of the settings are stored in the shared data in the main controller. The
database schema directive watches the settings and when they change it clears the current schema, if they are
valid it loads the schema based on those settings. The schema is displayed in a tree control provided by theangular-ui-tree17 package. When one of the attributes is clicked it send a signal to the main controller, this is
relayed to the editor and the corresponding predicate is inserted. The editor and sql viewer are small wrapper
directives around angular-ui-ace18 handling things like resizing. The editor also runs the predicate using the
Api service, it tells the main controller about the results and any errors. The error and results directives are
wrappers around the angular-ui-grid19 directive.
We used a text editor called ACE.20 It provided most of the required functionality out of the box but we made
some custom modifications to better suit our needs. We created a regular expression based syntax highlighter
that highlighted the predicate queries and modified the editor code to support mixed-width characters, which
was needed to properly display the logic symbols.
The dynamic layout was done using a code based on library called dock-spawn.21
It structures the window asa series of panels, tab controls and splitters, and allows the panels to be rearranged and resized, providing a
customisable interface. We converted it to the CommonJS module system and fixed some bugs in the modular-
ized code. The panels are easily extensible which allowed it to be integrated with AngularJS and ACE with no
major problems.
The style sheets are written using SCSS,22 an extension to standard CSS that has features such as variables
and functions. This allowed the styles to use common variables and facilitated making the large changes that
happened during the development of the project.
The development packages are managed using NPM,23 the node package manager. It automatically installs
all the dependencies required to build the frontend assets. The third party assets are managed using bower; 24
another package manager targeted more at client side packages. The frontend assets are compiled using grunt,25
a task automation tool. Grunt was configured to either produce a release build, that combines all the assets and
minifies them, or to watch the source files and automatically recompile when they change, which is very useful
during development.
3.3.3 Middleware
In between the JavaScript user interface and the Python translator is a lightweight middleware layer that acts as
an adapter, converting the AJAX HTTP requests into the relevant executions in the translation and converting
the results from the translator into JSON for the app to display.
The middleware layer was originally written in PHP26 because it is easy to setup and available from most host-ing providers. It implemented a lightweight MVC style architecture, with a router taking the model (serialized
as JSON), finding the appropriate controller and executing it, returning the result as JSON. Whilst this was easy
to setup and work with initially, we started having problems when more information was being passed between
the query are available, and (b) that the predicates have correct arity. We considered extending the semantic
analysis stage to other possible errors such as type checking, but decided that such errors should be returned on
the database end as predicate logic is by default not many-sorted.
Identifier Bound value
emp.name <PredicateNode@0x14b4f50>
emp.salary <PredicateNode@0x14b4f90>
n <VariableNode@0x14b4e90>
s <BindingVariableNode@0x14b71d0>
x <BindingVariableNode@0x14b7150>
Table 1: Top-level symbol table for the query from Fig-
ure 5.
In order to facilitate variable analysis, a symbol
table is generated once the AST construction is
complete. This allows us to uniquely identify the
entity that a variable represents. Note that therules of the grammar allow writing formulas with
the shape
◦x(. . . ◦ x(P x) . . .),
that is, reusing a variable symbol in a different
binding within a subformula. For this reason the
symbol table is hierarchical—it is constructed on
a per-level basis, where the occurrence of a quantifier ◦ ∈ {∀, ∃} signals the introduction of a new scope level.
A sample top-level symbol table is presented to the right.
It is not feasible to directly use an abstract syntax tree to generate SQL code. This is because the AST, even with
the symbol table, still does not contain all the necessary information about, say, variable resolution. Therefore,a tree walker generates an intermediate language struct that will then be used by the SQL code generator.
Consider Figure 6, the intermediate language struct for the query
∃d(employee.department(x, d) ∧ employee.department(y, d) ∧ x = y ∧ employee.salary(y, s)) (18)
Figure 6: An intermediate language struct for query (18)
The IL struct consists of two items. The π-list, corresponding roughly to the relational algebra’s projection
operator, indicates the predicates that will be part of SQL’s SELECT query. Query (18) asks for the primary keys
of two employees and the salary of the latter, hence the π-list consists of three items. The notation {p@r}.a
denotes attribute a from table r which is aliased as p.
The σ-list in turn is a list of condition trees that the code generator will use to construct FROM and SELECT.
The condition tree can express various constraints, such as the need to carry out a join or make sure that twovariables compare in a particular way. The query contains x = y, which expresses that the primary keys of two
employees are different—this is indicated in the first tree with = at the root. Also, the two employees are part on
the same department—thus we need to carry out a join on the variable d, whose type is employee.department.
This is expressed in the second tree, with the join condition 1 at the root and the joined relations on the
branches.
The compiler uses the common technique of stack-based intermediate representation [8], depicted in Figure 7.
The AST is visited in a bottom-up fashion, with a node accepting a Visitor object, dispatching it to the node’s
children, and then processing the result of the visits to the children. The Visitor keeps an internal stack of
intermediate language structs. The ∧ node in query dispatches the visitor to the two children nodes (predicate
nodes). When the control is returned, the visitor pops the two IL structs off the stack and combines them in anappropriate manner. This manner is dependent on the type of node that is currently visited. In the case of the
∧ node, the σ-trees have to be inspected to resolve clashing variable names to possibly introduce a join or a
WHERE-type condition, and can then be combined to make sure that all conditions from the two children nodes
19
8/9/2019 Automatic Translation of First-order Predicate Logic to SQL
are satisfied. Now, after aliases were possibly introduced, they can be applied to the two π-lists, which are then
simply combined. The result is pushed back onto the stack.
. . .
. . .∧
emp.dep(y, d)emp.dep(x, d)
L1 L2
L3
1 # class AstVisitor:
2 def visit_AndNode(self , node):
3 l2 = self .stack.pop()4 l1 = self .stack.pop()
5 # node-specific processing...
6 l3 = self .combine(l1, l2)
7 self .stack.append(l3)
Figure 7: The visitor consumes the representations L1, L2 of subqueries and combines them into L3
Once the translation of the AST to the intermediate language has been completed, the topmost IL struct accepts
an SQL visitor that produces the final SQL query. Given that most of the semantic processing has been done at
the stage of translation to IL, the bulk of the SQL visitor’s functionality is string generation. Nonetheless, it is
also equipped with enough introspection power to appropriately process the σ-trees—eg. to decide whether an
incoming condition is a JOIN that should augment the FROM stack, or a simple selection to be appended ontothe WHERE stack. The visitor is similarly stack-based, visiting child nodes and passing the result up to the parent
nodes. The usefulness becomes apparent in the processing of full subqueries of the form
-- ...
WHERE NOT EXISTS (SELECT ...)
In such cases we create another instance of the visitor that processes the inner query and returns the result in
the form of the SQL string, which can be then appended to the top-level WHERE operator.
3.5 Technical challenges
The biggest hurdle in the project was the implementation of the translator. Although we had prior experience
with a simple compiler project for a toy language, this time we did not have a reference compiler, nor a clearly
defined way of what the translation should result in. We owe the final result to lots of fundamental work,
pen-and-paper exercises, and discussion with the supervisor.
All our code is stored in two git repositories. Initially this caused some problems due to the .gitignore file being
wrong and temporary files being committed, and there were problems due to merging going badly. Once we
fixed the .gitignore file and added pre-commit hooks for testing these problems mostly resolved themselves.
We needed to be able to connect to a variety of different types of database server: PostgreSQL, MySQL, Mi-
crosoft SQL Server, etc. This caused problems initially because all three use different Python modules, needed
slightly different parameters for connecting and had different interfaces. Our initial attempt only supported
Postgres with a plan of creating a base class from that and extending it for each database server. During devel-
opment we found a library called sqlalchemy that encapsulated the different interfaces
Automated testing of the user interface proved difficult because there was no clear way of testing it automati-
cally. We tried some user interface automation testing libraries such as Selenium but found that the overhead
of managing the tests combined with the frequency of changes meant that it was not a viable option. As the
interface was fairly simply we have decided to test it manually in the end.
3.6 Risk management
The use of Agile methods has helped us immensely to contain the risks common to software projects. However,the tools provided by Agile are focused on scheduling commitments. Shore and Warden in [6, p. 232] indicate
that this should not overshadow the main goal of delivering an “organizational success”, that is, a piece of
software that delivers on the stakeholders’ needs and requirements.
20
8/9/2019 Automatic Translation of First-order Predicate Logic to SQL
For our continuous integration / deployment server we used Jenkins30, installed on our group virtual machine.
The user interface and the translator both had package managers that allowed them to be setup automatically.
This meant that the deployment server could be setup very quickly if it were ever lost. The continuous deploy-
ment system meant that we could quickly preview the global effect of a change on the system and we could
show it to users and our supervisor.
We used Google Drive for collaborative editing of coursework reports, storing reference articles etc. Discus-
sions were carried out using a dedicated Facebook group.
4.3 Team organization
We decided on a free form structure where we do not have a formal group leader but where we make decisions
based on everyone’s evaluation and consensus. Some decisions that directly affect the output of the application
were made after discussions with the supervisor on the topic.
After the first meeting with our supervisor we had a group meeting to decide how to split the project into self
contained tasks that could be assigned to members of the group. For the first week we decided that we needed
to do more in-depth research on predicate-logic, but at the same we wanted to start working on the actual
application. So we delegated the work as follows:
• Front-end: William Smith, Ana-Maria Carcalete
• Back-end: Yubin Ng, Samuel Ogbeha
• Translation: Sebastian Delekta, Arjun Saraf
This work allocation was based on our estimations as to what parts of the project were most important at the
time along with previous experience of the group members. However, over the time, we kept changing the
priorities of most of these tasks, as it turned out that the translation of the implementation was the part of the
project that required most manpower.
Moreover, we scheduled to meet twice a week to discuss our progress, problems and to delegate new work.Usually the work was allocated such that it is completed within a week. However, we always had an active
Facebook conversation where one could mention the problem one is facing and seek help from everyone.
We made use of the practice of pair-programming in order for everyone to understand different aspects of the
project and advise each other on implementation points. This allowed a free flow of ideas throughout the team
and facilitated collaborative effort, allowing us to make the best use of development time.
5 Evaluation
5.1 Testing and translation correctness
To ensure that we were building the correct product, we decided to begin the project on a theoretical basis,
and to build up on the extensive research already done in the area of interfacing relational databases and logic.
This led us to create an extensive theoretical model for our project which included a comprehensive mapping
between the two languages that was our point of reference during the implementation.
Next, we had to ensure our implementation of this framework was correct, so before writing any translation
code we had meetings to discuss various problems and concepts and try to solve instances of them on paper
using our framework to see how robust it was. This involved pen-and-paper translations of multiple sample
queries, both in team meetings as well as those coming from our supervisor during the weekly catch-up.
Due to dealing with the very essence of programming languages, a compiler project must be made especiallyrobust to invalid inputs, and the expected output must be clarified as soon as it is feasible. Before attempting
Figure 11: Compiler unit test for query: employee.name(x, n)
5.2 Usability testing
As soon as the web interface was completed and basic compiler functionality was implemented, we created a
dedicated testing environment and began running ad-hoc usability studies on fellow students in the department.
We were deliberately vague in our introductory explanations as we wanted to determine whether the tool is in-
tuitive enough for the user to simply jump straight to query execution. We ran those brief sessions on volunteers
in the labs as time permitted, asking them to attempt retrieving certain information from a provided database
and to fill out a brief questionnaire afterwards. (See Appendix A.1) All queries that the users tried to execute,
even when failing, were logged to allow us to see what confused users the most about how queries should be
written.
This survey has allowed us to discover a few potential issues with the tool. Firstly, it was not at all clear what itexactly means to translate a predicate logic formula into an SQL query. Inspecting the log has shown that users
were often confused about restricting free variables to enforce projection and giving appropriate arguments to
the predicates. This is something we were aware of from the beginning, as we also had to get used to the idea
24
8/9/2019 Automatic Translation of First-order Predicate Logic to SQL
One of our priorities was to make sure that our web application was highly responsive all the time. We used
client-side JavaScript and AngularJS framework for our front-end. In order to improve the user performance
and to make the web application more responsive, we used techniques such as AJAX that allowed to load data
dynamically and asynchronously. Moreover, we used lightweight JSON format for making the transfer of data
from frontend to backend and vice-versa fast to avoid redundantly using resources of the server. Thus, thisenhanced our productivity by making deployment faster.
The translator did not have a large performance requirement, if it had been slow then loading indicators could
have been added to the user interface. The majority of the time spent in the translator is actually during
communication with the database server, this is something cannot really be optimised however by caching the
schema some time could be saved for later queries. This caching strategy was not added because performance
was not an issue.
5.4 Project management
Along with evaluating our project we also evaluated our team and the progress we made each week. As wechoose to follow parts of the Scrum development method we used an online platform to keep track of our
progress during development and assign tasks to each member.
As Agile requires periodic development cycles we decided that the best time frame for one cycle is one week.
Our initial tasks were based on our project specifications and the first meeting with our supervisor. As the
project progressed we added more tasks to our online board according to what our supervisor requested.
Our progress chart shows that initially our progress was made slowly until we had a better idea on what the
project entitles. Also the chart shows that our initial tasks were rather large to fix this we further split one large
task into smaller parts. Some of our development cycles were affected by the amount of outside work we had
in that week, this can be observed as we had cycles where most of the work was done at the end of the week.
Most of the new tasks were added to our planning board after each weekly meeting with our supervisor, although
there were times when we added new tasks after one of our team meetings.
Figure 13: Development cycle task process
5.5 Evaluating deliverables
A straightforward way of evaluating the success of a project is to revisit the requirements. Here we refer to
Section 1.2 where the key objectives of the project were defined.
26
8/9/2019 Automatic Translation of First-order Predicate Logic to SQL
• Functionality. We have been successful in developing a translator capable of converting predicate logic
queries to SQL. Support is provided for quantification, negation, binary connectives, resulting in possibly
complex SQL queries that include joins. There are some minor issues that we have not managed to fully
look into due to time constraints, and are therefore not implemented—as an example, using an variable
from an outer scope inside an inner scope introduced by the general quantifier (∀) will be ignored. Two
major ideas that were not implemented are the query optimizer and support for write operations. The
former was not a firm requirement and we have mostly considered it as a possible extension of theprimary tool that supports querying. As for the optimization, some of our queries are still far from
optimal. As an example, because binary connectives are considered pairwise without regard for their
possible commutativity, a query of the form
r.a1(. . .) ∧ r.a2(. . .) ∧ . . . ∧ r.an(. . .)
will result in n − 2 joins, even though none are required. For the most part, this is not a performance
problem, given that the SQL client is most likely able to apply an optimization of its own. By way of
this, our tool is perhaps more clearly useful as a query executor rather than a translator .
• Ease of use. This was one of our main concerns throughout the stages of the project, from the compiler
to the web front-end. The syntax of the logic accepted by the translator is not far removed from the usualsyntax, and we tried to not diverge too far from it. The results of the usability study suggest that, even
in spite of the learning curve being somewhat steep for some users, it did not seem to represent enough
of a hindrance that a user may give up on trying to use logic for database querying. The user interface
was designed with clarity in mind. End-users seemed to find their way around it easily, from entering
the database connection settings and establishing a connection to quickly formatting the query thanks to
syntax highlighting, autocompletion, and dedicated support for symbol insertion.
• Portability. The tool is fully portable; using standard web frameworks and technologies it can run on any
major browser. Offline support is provided and a locally hosted database can be queried just as well as a
remote one. Third-party database library and using only ANSI SQL code in the translator allowed us to
support most relational database standards.
As a group we are satisfied with the outcome of the project. Given the time constraints, we have managed to
developed a usable tool that satisfied all crucial requirements, and which the targeted users are certain to find at
least somewhat useful.
6 Conclusion and Future Extensions
6.1 Conclusion
Overall we are satisfied with the outcome of the project. We managed to achieve our initial target of translating
predicate logic to SQL queries with a fully functioning web application as our front end. Although not all goals
were definitely met, we are confident that our target user groups will find it beneficial to their learning and
database management experience.
Developing ATLS was not an easy task. It required us to first strengthen our understanding of both predicate
logic and SQL, and to develop a large theoretical framework before any translation code could be attempted.
We believe it was the right approach and the implementation work would require much more effort otherwise.
6.2 Front-end
The core requirements of the user interface have been met however there is still room for improvement. Twomajor extensions were planned if there was extra time. The first extension is having a user account system
that would allow the customized layout and queries to be saved. This would mean that users could resume
working instantly instead of having to setup the editor to their preferred layout. The second extension was code
27
8/9/2019 Automatic Translation of First-order Predicate Logic to SQL
completion. ACE has good support for code completion so the major work would be in the translator. It would
require the lexer and parser to support partial parsing. Then analysis of the final state of the parsing would need
to be performed to determine which tokens are valid, as well as checking what variables and predicates might
be valid in the context. Implementing dockspawn using native angular concepts would have created slightly
clean code in places, removing the need for some tight coupling.
During development of the user interface, two main areas proved difficult, changing requirements and testing.
Whilst the main functional requirements remained constant throughout the project, there were numerous smalladditions and changes made. This meant that code and layouts were having to be rewritten quite often. The
problem was mostly resolved with the final, fully customizable layout because the layout could be changed on
the spot without having to change any code. Overall, the frontend and middleware was fully rewritten at some
point during the projects duration.
Automated testing also proved difficult for the user interface because there was no clear way of testing it
automatically. We tried some user interface automation testing libraries such as Selenium but found that the
overhead of managing the tests combined with the frequency of changes meant that it was not a viable option.
As the interface was fairly simply we have decided to test it manually in the end.
If this system were to be used in production an SSL certificate would be required to prevent the database
connection details (and user account details) being stolen in transit.
6.3 Translation
Though we implemented the translation for all basic SQL qeuries, there is still a number of improvements and
possible extensions that would make our application even better.
For instance, we are only translating two joins, cross and equi-join, at the moment. Given sufficient time, we
can extend the translator to translate the remaining joins. Secondly, our produced queries are correct but not
always optimal as they follow a rigid formula in translation. Optimization of queries can be done on a case by
case basis but it takes a very long time.Additionally, as expected, our translator also suffers from producing queries which are not readable. This is
due to the same reasons as above. Our queries might not be user friendly and intuitive. This is something that
can be improved by optimizing the translator. Also, we also have not included support for some advance SQL
functions like sorting. Moreover, currently we only support read function, given enough time, we would like to
explore write function too.
28
8/9/2019 Automatic Translation of First-order Predicate Logic to SQL