Constraint-Based Updates in a Functional Data Model Database

Constraint-Based Updatesin aFunctional Data Model DatabaseA thesis presented for the degree ofDoctor of Philosophyat the University of AberdeenSuzanne M. Embury, BSc. Hons.(University of Kent at Canterbury)1995

DeclarationThis thesis has been composed by myself, it has not been accepted in any previous ap-plication for a degree, the work of which it is a record has been done by myself and allquotations have been distinguished by quotations marks and the sources of informationspecially acknowledged.Suzanne M. Embury12th April 1994Departments of Computing Scienceand Molecular and Cell BiologyUniversity of AberdeenKing's CollegeAberdeen, Scotland

i

AcknowledgementsI would like, �rst of all, to express my thanks to my two supervisors: Prof. Peter Gray forall his help and encouragement over the past three (and a bit) years and for entrustingme with the care of the backs of one or two of his envelopes, and Prof. John Fothergillfor his patience in explaining the intricacies of protein structure to me and for puttingup with the many grammatical barbarisms of computing jargon.I would also like to thank the current and ex-members of the Object DatabaseGroup at Aberdeen, who have kept me entertained in one way and another duringthe stresses and strains of my time at Aberdeen. In particular, I should like to thankDr. Zhuoan Jiao for her constant friendship and generosity and never-failing cheerfulness(even when in the middle of writing her thesis or proof-reading parts of mine!), Dr. OscarDiaz for allowing me to proof-read his thesis and for making meta-classes seem easy,Dr. Graham Kemp for his helpful comments on parts of my thesis, for providing mewith some interesting biochemical applications of my work and for allowing me to usecopies of two of his photographs in this thesis1, and Dr. Martin Jones for giving up partof his holiday to proof-read some of my thesis and for all the \Dave Barry". Thanks alsoto Scott Leishman for his proof-reading e�orts and for volunteering to take that awfulIndy o� my hands, and to John Owens and John Boyle for general entertainment value.All I can say is that sharing an o�ce with you lot has de�nitely been an experience!Particular thanks must be given to Nicolas Graner for many very happy lunchhours discussing English novels and French words and Scottish dances. And thanks alsoto Sylvie and Lucile Graner for their friendship and for letting me babysit. That wasan experience too!Thanks also to Dr. Pat Fothergill for her generous hospitality and wonderful1Horse Haemaglobin and Concanavalin A, shown on page 9.ii

iiicheesecake. To Yumiko Ishitani for feeding me during one of the busiest parts of thelast year, and for forgiving me (I hope) for not having 'phoned her for so long. To JoScruby and Chris Stratford for keeping in touch and for lots of happy memories.Finally, and most of all, thanks to Mum, Dad, my sister Yvonne and Rosie thedog, for all their un agging support and patience and for putting up with my living sofar away from home.The work described in this thesis was supported by the SERC.

SummaryThe salient points of this thesis are as follows:� The Functional Data Model is a conceptually simple but expressive data modelthat o�ers good support for the declarative speci�cation of data retrieval. How-ever, the lack of an explicit notion of \state" makes it di�cult to support updatesin a fully functional environment. This thesis describes the use of structural andsemantic domain knowledge, expressed in the form of constraints, to extend someof the declarative potential of the FDM to updates in the P/FDM database sys-tem.� The Daplex language has been extended to allow the declarative speci�cation ofintegrity constraints in a functional style. The constraints are stored in the meta-data in both their declarative and procedural forms, and the metadata interfacehas been extended to allow exible access to this information. The expressivepower of the language is demonstrated by using it to describe the semantics ofthree-dimensional protein structure.� An integrity maintenance subsystem has been implemented for P/FDM, whichuses the constraint metadata to check that individual updates do not violate con-straints before they are made. The implementation improves on previous ap-proaches to constraint maintenance in that it allows constraints to be added ordeleted freely, while minimising the associated performance overheads.� A simple, user-controlled transaction mechanism has been implemented which al-lows the semantic constraints to be violated temporarily during complex compos-ite updates, but which checks that integrity has been restored before allowing theiv

vtransaction to commit. The implementation reuses the existing database primi-tives and storage structures, thus gaining in reliability and maintainability. More-over, since transaction abort is extremely inexpensive under our architecture, thetransaction mechanism is also suitable for supporting hypothetical transactions,allowing safe experimentation with \what if?"-type updates.� A further extension to the Daplex language has been implemented which allowsthe declarative speci�cation of the creation of sets of database objects. In thislanguage, the user describes their updates in terms of the constraints that the�nal database must satisfy, and the DBMS then undertakes the task of searchingfor and creating a suitable set of objects. The ability of the language to expresscomplex updates at a high level is illustrated by considering some example updatesfrom the protein structure database.� A prototype semantic optimiser has also been developed for declarative updates,which attempts to use the available integrity constraints to reduce the searchspace that must be examined. The implementation illustrates the usefulness ofthe exible interface to the constraint metadata, and the suitability of the internalconstraint and program forms for \on-the- y" manipulation.� The domain information provided by both structural and semantic constraintscan be exploited by the DBMS in order to provide more declarative support forupdates, both in a restrictive way (i.e. in preventing invalid updates or transactionsfrom being executed) and in a generative way (i.e. in searching for sequences oflow-level operations that will meet a user's high-level speci�cations). The resultingsystem extends some of the bene�ts of declarativeness to user updates, while notdetracting from the bene�ts of the functional approach to data retrieval.

Contents1 Introduction 11.1 Updates in the Functional Data Model : : : : : : : : : : : : : : : : : : : 11.2 An Example Problem Domain - Protein Structure Data : : : : : : : : : 71.3 Overview of the Thesis : : : : : : : : : : : : : : : : : : : : : : : : : : : : 102 The Prolog/Functional Data Model 132.1 The Data Model Elements : : : : : : : : : : : : : : : : : : : : : : : : : : 132.2 The Architecture of P/FDM : : : : : : : : : : : : : : : : : : : : : : : : : 242.3 Metadata Structure and Access : : : : : : : : : : : : : : : : : : : : : : : 282.3.1 System Access to Metadata : : : : : : : : : : : : : : : : : : : : : 322.3.2 External Access to Metadata : : : : : : : : : : : : : : : : : : : : 342.4 The Primitives of P/FDM : : : : : : : : : : : : : : : : : : : : : : : : : : 372.4.1 The Data Retrieval Primitives : : : : : : : : : : : : : : : : : : : 382.4.2 The Update Primitives : : : : : : : : : : : : : : : : : : : : : : : : 403 Integrity Constraints in P/FDM 483.1 Approaches to Constraint Maintenance : : : : : : : : : : : : : : : : : : : 503.2 The P/FDM Constraint Maintenance Subsystem : : : : : : : : : : : : : 563.2.1 Metadata Structures For Constraints : : : : : : : : : : : : : : : : 573.2.2 The Constraint Manipulation Primitives : : : : : : : : : : : : : : 603.3 The Constraint Language Extension to Daplex : : : : : : : : : : : : : : 65vi

CONTENTS vii3.3.1 The Internal Format of the Constraint Language : : : : : : : : : 683.3.2 Generating Initialisation Code for Constraints : : : : : : : : : : : 703.3.3 Generation of Individual Code Fragments for Constraints : : : : 723.4 Constraints in the Protein Database : : : : : : : : : : : : : : : : : : : : 824 Constraints and Transactions in P/FDM 904.1 An Overview of the Transaction Mechanism : : : : : : : : : : : : : : : : 934.2 The Transaction Mechanism : : : : : : : : : : : : : : : : : : : : : : : : : 1014.2.1 The Transaction Primitives : : : : : : : : : : : : : : : : : : : : : 1014.2.2 Data Manipulation Under Transactions : : : : : : : : : : : : : : 1034.3 The Commitment Process : : : : : : : : : : : : : : : : : : : : : : : : : : 1154.3.1 Checking Constraints Under Transactions : : : : : : : : : : : : : 1164.3.2 Committing the Changes : : : : : : : : : : : : : : : : : : : : : : 1174.4 Related Work : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1215 Non-Deterministic Updates in P/FDM 1255.1 Syntax and Semantics of the Daplex Extension : : : : : : : : : : : : : : 1265.2 Database Support for the Daplex Extension : : : : : : : : : : : : : : : : 1315.3 Compilation of the Daplex Extension : : : : : : : : : : : : : : : : : : : : 1365.4 Use of Integrity Constraints to Prune the Search Space : : : : : : : : : : 1455.5 Related Work : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1505.6 Non-Deterministic Updates in the Protein Database : : : : : : : : : : : 1576 Conclusions and Future Directions 1606.1 Useful Architectural Features for Constraint-Based Updates : : : : : : : 1626.2 Future Directions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1646.2.1 Integrity Constraints : : : : : : : : : : : : : : : : : : : : : : : : : 1646.2.2 Transactions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1666.2.3 Non-Deterministic Updates : : : : : : : : : : : : : : : : : : : : : 169

CONTENTS viii6.2.4 A Combined System for Repairing Transactions : : : : : : : : : : 170A Data De�nition and Module Management Primitives 173A.1 Primitives that operate on metadata : : : : : : : : : : : : : : : : : : : : 173A.2 Primitives that operate on modules : : : : : : : : : : : : : : : : : : : : : 174B Daplex Syntax 175C Daplex De�nition of the Metadata Schema 180D Prolog Solution to the Music Lesson Allocation Problem 183E The Music Lesson Database 186E.1 The Music Lesson Database Schema : : : : : : : : : : : : : : : : : : : : 186E.2 Contents of the Music Lesson Database : : : : : : : : : : : : : : : : : : 187

List of Figures1.1 Example �-helices (shown in purple) in the �1 and �1 subunits of HorseHaemoglobin (Brookhaven entry 2MHB) : : : : : : : : : : : : : : : : : : 91.2 Example of a �-sheet (shown in yellow) in Concanavalin A (Brookhavenentry 1CN1) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 91.3 Diagrammatic representation of the Protein Database schema : : : : : : 112.1 Sta�/student inheritance hierarchy : : : : : : : : : : : : : : : : : : : : : 172.2 Modelling research sta� who are also students (a) using an extra subclassand (b) using multiple inheritance : : : : : : : : : : : : : : : : : : : : : 182.3 Schema illustrating ambiguities of function binding under overlappingsubclasses : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 202.4 The architecture of P/FDM : : : : : : : : : : : : : : : : : : : : : : : : : 272.5 Diagrammatic representation of the metadata schema : : : : : : : : : : : 352.6 Propagation of updates from the creation of an �-helix : : : : : : : : : : 422.7 Key-dependency relationship between chains and proteins : : : : : : : : : 463.1 Schema illustrating the handling of inherited constraints : : : : : : : : : 643.2 Functional components of the Daplex compiler : : : : : : : : : : : : : : : 683.3 Schema for the event generation example : : : : : : : : : : : : : : : : : : 753.4 Three representations of constraint c5 : : : : : : : : : : : : : : : : : : : 773.5 Reformulation of example constraint graph for code generation : : : : : : 79ix

LIST OF FIGURES x4.1 Modifying the structure of a chain (a) before the update, and (b) after theupdate : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 934.2 The state of the transaction modules after entity creation : : : : : : : : 954.3 The contents of the transaction modules after function update : : : : : : 984.4 Three representations of an instance hierarchy : : : : : : : : : : : : : : : 1114.5 Instance inclusion example (a) before the update, and (b) after the update 1134.6 Contents of the transaction modules after key function update : : : : : : 1155.1 Diagrammatic schema for the Music Lesson Database : : : : : : : : : : 1295.2 Initial state of the Music Lesson Database : : : : : : : : : : : : : : : : : 1305.3 Using a temporary module to store backtrackable updates : : : : : : : : : 134

Chapter 1Introduction1.1 Updates in the Functional Data ModelDatabase updates are often considered to be the poor relation in terms of data manip-ulation languages (DMLs), with techniques for data retrieval receiving by far the bulkof the research attention. Updates are not, in general, referentially transparent and aretherefore hard to reason about and optimise. Also, the complexity of many updatesmeans that it is di�cult to \undo" their e�ects, if they are applied incorrectly to adatabase. These problems are particularly severe for the more declarative data models,such as the Functional Data Model (FDM) [98], which rely heavily on the property ofreferential transparency in the de�nition of the semantics of database interactions [17].The FDM was proposed by Sibley and Kerschberg [100] and is an entity-basedsemantic data model [55] that views a data retrieval task as the process of evaluatingand returning the result of a function. Thus, the schema of an FDM database consistsof several arity-0 functions each of which return the set of instances of a particulardatabase class; several arity-1 functions which map entity instances onto the valuesof their attributes or onto other instances to which they are related; and functions ofarbitrary arity, which represent more general mappings between data values. Thesefunctions may be represented extensionally, by stating the domain and range of themappings explicitly, or intensionally, by stating an expression which will evaluate therequired result for a given argument. 1

CHAPTER 1. INTRODUCTION 2The data model which results from this view of data representation is concep-tually simple, but is also very expressive. In particular, the advantages of the FDMare: � it is an irreducible data model, i.e. it is composed of a small set ofvery basic concepts which represent semantically irreducible units ofinformation. This simpli�es the process of schema design, and allows asimple, graphical representation of schemas.� The FDM is an entity-based model, and therefore supports referentialintegrity [27] implicitly. It is also capable of supporting other impor-tant data model features such as object-identity [66] and inheritancehierarchies of entity classes [102].� FDM schemas are easily extendible, i.e. entity classes and functionsmay be added or deleted without requiring reorganisation of existingschema elements.� The conceptual simplicity of the FDM means that it can be used torepresent the relational, hierarchical and network models [98], and italso shares many of the important features of newer data models, suchas the object-oriented model [111]. Thus it is a suitable model forintegrating heterogeneous schemas in multidatabase systems [28].In order to gain the full bene�t of the functional paradigm in a database context,however, it is necessary to combine the FDM with a functional data language. The twoearliest such languages were Daplex [98] and FQL [16]. Daplex is a high-level, end-userlanguage which describes the evaluation and manipulation of sets of database values,while FQL takes a stream-based view of computation and is intended to be used asan internal data language. Both languages are declarative, relatively concise and areamenable to program transformation.The attractions of the functional style of these languages have resulted in thedevelopment of many functional systems (e.g. EFDM [68], FDL [91], GDM [6], P/FDM[50] and PFL [92]), and have even prompted the use of functional data languages foruse with non-functional data models (e.g. IPL [1], LIFOO [11] and PDM [76]). Theadvantages of a functional style for data languages are:

CHAPTER 1. INTRODUCTION 3� programs are speci�ed declaratively as referentially transparent expres-sions, which have a clean, well-understood semantics (i.e. the �-calculus[69]), and are therefore easy to transform, optimise and reason about.� Functionally-expressed programs are typically more concise than pro-grams expressed in the other main declarative database paradigm, thelogical data model [43]. Functional programs typically require signif-icantly fewer variables than their relational equivalents, since neitheroutput variables nor the intermediate variables in chains of functionapplications are stated explicitly. Also, the implied directionality offunctional expressions [96] makes arithmetic expressions much easier tohandle than in logic languages.� Functional programs can be composed as easily as programs expressedin the relational calculus [49], but they are also able to express arbi-trarily complex, recursive computations.� The standard \functionals" of functional programming (such as mapand filter) are very suitable for giving concise descriptions of compu-tations over bulk data, such as large database sets.� Functional programs are amenable to static type checking.� Functional programs can be \lazily evaluated" [10], which not onlyallows the possibility of operating over in�nite data structures, but canalso avoid much redundant data retrieval when operating over largedatabase sets.Unfortunately, as we have said, none of the declarative data model paradigms, whetherfunctional or logical in basis, deal with updates as well as they can handle data re-trieval. There are no primitives for handling state change (or even any notion of\state") in functional programming languages and there are, therefore, no obviouslyanalogous primitives for describing updates in functional data languages. Moreover, thepresence of updates within an expression can destroy referential transparency, limitingthe possibilities for transforming the expression and severely complicating the processof reasoning about it. Finally, the fact that the FDM is a semantic data model thatmaintains several important structural constraints, such as referential integrity, means

CHAPTER 1. INTRODUCTION 4that it is di�cult for the user to foresee the exact consequences of any updates, and itis even more di�cult to \undo" even a single update action if it is afterwards found tobe incorrect or illegal.Four approaches to supporting updates within the FDM have been explored todate, two of which attempt to stay wholly within the bounds of the functional paradigm,and two of which take a more pragmatic approach and relax the theoretical constraintswhere updates are involved.Updates as Changes to an Explicit Database State The most obvious way toprovide a functional treatment of database updates is to model all database \com-mands" as functions which map an old database state into a new one [2]. Forexample, the following function, de�ned in Miranda [104], implements a simpledatabase management system (DBMS):dbms :: db -> [transaction] -> [response]dbms db [] = [end of input]dbms db (trans : rest)= output : dbms newdb restwhere (output, newdb) = evaluate db transThe evaluate function takes a database state and a transaction speci�cation,and returns the new database state (newdb) formed by executing the transactionwithin the given state, and any output message for the user (output). Retrievalcommands return the original database state unchanged as the new state, andthe data requested as the output message. An update transaction, on the otherhand, returns an updated version of the original database and an empty outputmessage. The dbms function, then, executes each of the transactions in its inputlist in turn, using the resulting database state from one execution as the inputstate for the next.This approach to handling updates (which is directly analogous to the use of dy-namic logics for supporting update in logic databases [80]) is theoretically elegantand has some interesting properties [2], but it is not of much practical use forthe implementation of real database systems. The creation of so many individ-ual database states is hardly e�cient in terms of secondary storage, and even if

CHAPTER 1. INTRODUCTION 5unchanged parts of the database state are shared between consecutive states, thestorage overhead will still be considerable for large and/or long-lived databases.Updates as Changes to the Program Environment A variation on the previousapproach is to view changes to data as rede�nitions of the functions whose exten-sions de�ne that data | i.e. as a change to a program environment [91, 84]. Thisis directly analogous to the behaviour of standard functional programming lan-guages, in which expressions are evaluated with respect to some \current environ-ment", that may be modi�ed by adding new function de�nitions or by rede�ningexisting functions.The advantage of this approach to updates over that previously described is thatthe program environment is used as an implicit state, and thus there is no need topass explicit representations of the current database state back and forth betweenfunctions. However, it su�ers from the same practical implementation problemsas the previous approach, in that the environment is not truly updated, but isonly ever extended (with deletions being e�ected by rede�ning the appropriatemappings to be \unde�ned"). While this can be useful for historical applicationswhich require previous states of the database to be retained, it imposes a verysevere storage overhead for those applications which do not. Moreover, neitherof these two theoretical approaches to supporting updates encourage the user totake any less of a procedural approach to the speci�cation of transactions, sinceupdates are still viewed as sequences of explicit state changes.Functional Query Languages Recognising the di�culties of handling updates in afunctional way, and not wishing to compromise the clarity and conciseness oftheir languages in order to support some theoretical notion of state change, thedevelopers of some systems [16, 6] have restricted the scope of their languages tohandling database queries only.Embedded Update Commands in a Functional Language The �nal approach,and the one that has been most widely adopted, is to provide the user witha set of update \functions" which are allowed to side e�ect the database state[68, 50, 92, 39, 1]. The DBMS is able to detect when an expression makes useof these side-e�ecting functions, and it suppresses any transformations or infer-

CHAPTER 1. INTRODUCTION 6ences which assume referential transparency of the expression. In some of thesesystems (e.g. PFL [92] and ADAPLAN [39]) the DBMS provides a �xed set oflow-level update functions, while others (e.g. IPL [1] and P/FDM [50]) allow theuser to de�ne new update functions which they declare as causing side e�ects atcompile-time.As we have seen, none of these approaches are ideal solutions. The theoretical solutionshave practical implementation problems, whereas the more pragmatic approaches, whilegiving good support to the user for data retrieval, more or less abandon the user duringthe more di�cult process of data update.This thesis explores the use of semantic domain information in the P/FDM func-tional data model database, to help the DBMS to bridge the gap between the declarativeand the pragmatic views of database update. Our aim is not a declarative modelling ofstate change (which is only of direct bene�t to the database developer), but the provi-sion of declarative support for updates at the user's conceptual level. In such a system,users should be able to describe the conditions that are maintained by \legal" databasestates, leaving the DBMS to decide how best they should be maintained. We have takena �rst step towards this goal by providing an extension to the Daplex language for thedeclarative description of semantic integrity constraints in a functional style. We havealso implemented an integrity maintenance subsystem for P/FDM, which uses simpli-�ed procedural constraint descriptions, generated by the constraint language compiler,to check the legality of individual update operations before they are applied to thedatabase state.We would also like to allow users to make complex trial updates, with the DBMStaking full responsibility for restoring the database to its original state if this should berequired. In order to provide such a facility for users of P/FDM, we have implementeda simple transaction mechanism, in which the DBMS uses its knowledge of the struc-tural constraints enforced by P/FDM to keep track of the full consequences of complexdatabase updates, and to undo its e�ects or apply them to the database as the user re-quires. The DBMS also takes responsibility for ensuring that each transaction generatesa database state which does not violate any of the semantic integrity constraints.

CHAPTER 1. INTRODUCTION 7An ideal marriage of the declarative and the pragmatic approaches to supportingupdates would be a system which allowed users to describe their updates declaratively,and at as high a level as possible, and which then used the available semantic domaininformation to discover a particular sequence of low-level update operations that wouldachieve the e�ect required by the user. Notice that this is quite di�erent from the currentapproaches to declarative updates, which still require users to describe their updates interms of individual state changes, even though these state changes are then evaluatedin the context of a formally-stated declarative semantics. Instead, in our higher-leveldeclarative approach, users state what their updates must achieve rather than how therequired result is to be achieved. Again, we have take a �rst step towards this goal byimplementing version of the Daplex DML which allows the declarative description of thecreation of sets of instances. In this language the user simply describes the constraintsthat the new instances must satisfy and the DBMS then uses these constraints, andany relevant semantic domain constraints, to search for a sequence of updates that willsatisfy the user's requirements.1.2 An Example Problem Domain - Protein StructureDataThe main application of P/FDM to date has been its use in the storage and manipulationof three-dimensional protein structures [51]. Proteins are an important class of organicmacromolecules, that play a range of roles in the functioning of living systems. A proteinconsists of one or more chains of amino acid residues, which are folded in such a waythat the interface that is formed by the surface of the molecule exhibits some usefulbiochemical property or behaviour. Each amino acid residue consists of two subunits.One of these is common to all residues, and it is this subunit which is covalently bondedto at most two other residues in order to form a continuous chain (called the backbonechain). The second subunit is bound to the backbone subunit at an atom called the �-carbon, and is called the side chain. There are only twenty di�erent types of amino acidside chains that occur naturally in proteins, each being slightly di�erent in size or havingslightly di�erent chemical properties. However, the number of potential sequences ofthese twenty amino acids is very large indeed (protein chains will typically contain

CHAPTER 1. INTRODUCTION 8anything from 100 to 1000 residues) and it is this exibility that allows the same basicstructure to have such a diverse range of functions and properties.Protein structure is described in terms of four levels. The �rst level, called theprimary structure, is given by the sequences of amino acid residues that make up thechains of a particular protein. It is known that the three-dimensional shape which aprotein adopts is determined almost entirely by its amino acid sequence, therefore, insome sense, the primary structure gives a complete description of the structure of aparticular protein.A description of a protein at the second level (its secondary structure) gives alocalised indication of the fold adopted by a particular segment of chain. There aretwo types of regular structure which recur in the majority of proteins because of theirstructural stability. One of these structures is the helix, which is stabilised by a set ofnon-covalent interactions called hydrogen bonds, occurring along its length. There areseveral varieties of helix, which di�er only in the handedness (i.e. the direction) and thetightness of their coil. The most commonly occurring is the �-helix, which is illustratedin Figure 1.1.The second type of favoured structure is called the �-sheet. Sheets are formed bystrands of chain lying roughly parallel to each other and stabilised by hydrogen bondslinking adjacent strands (see Figure 1.2). The secondary structure of a protein is givenby stating which subsequences of the primary structure adopt helical conformations,which form strands within �-sheets, and which form the loops between these elements(i.e. the so-called random coil).The third level of description, the tertiary structure, is the speci�cation of therelative three-dimensional positioning of all the constituent atoms within a protein.The �nal level, describing the quaternary structure, speci�es the relative positions ofthe individual chains within a protein.Each of these four levels gives a di�erent perspective on a particular protein, andthey are all therefore represented explicitly in the P/FDM protein database [65], theschema of which is illustrated diagrammatically in Figure 1.3.

CHAPTER 1. INTRODUCTION 9

Figure 1.1: Example �-helices (shown in purple) in the �1 and �1 subunits of HorseHaemoglobin (Brookhaven entry 2MHB)

Figure 1.2: Example of a �-sheet (shown in yellow) in Concanavalin A (Brookhavenentry 1CN1)

CHAPTER 1. INTRODUCTION 10Protein structure information has the characteristics of both design data [63]and scienti�c data [99], in that it is basically hierarchical in nature, it involves complexsemantic integrity constraints and the interpretation of the data generally requires acombination of both data retrieval and computation. The current database stores some80Mb of protein data, and is being used to answer ad hoc queries relating to proteinstructure, and as the basis of a system which assists in the modelling of proteins ofunknown structure [64]. The storage of experimentally-determined data does not, initself, require much support for complex updates, since the majority of changes to thisdata will be additions of new structures as they are published. The programs whichanalyse the data, however, do involve complicated updates, especially those like themodel building program just mentioned, which are required to test complicated struc-tural hypotheses against existing data. The domain of protein structure also involves arich set of integrity constraints, ranging from the simplest domain constraints to com-plex biochemical and structural rules, which must be maintained if the analysis of thedata is to produce useful information. It is with this context in mind that the threeextensions to P/FDM described in this thesis are designed, and against which they areevaluated.1.3 Overview of the ThesisThe remainder of this thesis is organised as follows:� Chapter 2 describes the data model and architecture of P/FDM, dwelling in par-ticular on two aspects which are of most importance to the extensions describedhere, namely the structure and manipulation of metadata in P/FDM, and theprimitives that have been provided for data manipulation.� Chapter 3 describes the implementation of the constraint language compiler andthe integrity maintenance subsystem for P/FDM, and by considering previousapproaches taken to integrity maintenance shows how our architecture overcomesthe two main de�ciencies of these earlier systems. The chapter concludes witha discussion of some of the semantic integrity constraints present in the proteinstructure domain.

CHAPTER 1. INTRODUCTION 11proteinproteincomponentchainstructureloop helix strandthreeten alpha pisubcomponentresiduesaltbridgehbond atom

sheetfunctionalsitecomponent protein

�rst structure structure chainfollows parallelantiparallelneighbourdisulphidepos sourceneg sourcedonoracceptor atom + stringresidue structureabsolute pos + integerres by name + string has component

strand sheetsheet proteinactive site

site componentFigure 1.3: Diagrammatic representation of the Protein Database schema

CHAPTER 1. INTRODUCTION 12� Chapter 4 describes the implementation of a simple transaction mechanism basedon the existing data manipulation primitives and storage types. It also describesour approach to integrity constraint checking at transaction commit-time, andcompares this with the approaches taken in other systems.� Chapter 5 describes the extension of the Daplex compiler with a new constructthat allows the declarative speci�cation of instance creations. The semantics ofthe extended language are informally illustrated by considering a simple time-tabling example, and the process of code-generating a program that will searchfor and create a suitable set of instances is described. The chapter summarisessimilar features for describing search and updates declaratively in other systems,and outlines the potential uses for the extended language in the protein modellingapplication.� Finally, Chapter 6 summarises the main contributions of the thesis, and discussesseveral future directions which lead on from this work.

Chapter 2The Prolog/Functional DataModel2.1 The Data Model ElementsIn the Function Data Model, real world entities are represented as entity classes, andtheir attributes are represented as functions de�ned on these classes. The followingschema fragment, given in the Daplex DDL, for example, represents the class of teachers,with attributes giving their names and subject area:declare teacher ->> entity % Class definitiondeclare surname(teacher) -> string % Attribute definitiondeclare given name(teacher) -> stringdeclare subject(teacher) -> string;The three functions de�ned here are single-valued (as indicated by the single-headedarrows in the schema) and scalar-valued, since they map instances of the person classonto single instances of the built-in scalar type string. Two numerical built-in typesare also available | integer and float. Multi-valued functions are de�ned by givinga double-headed arrow in the schema, in which case they represent a mapping to setsof values of the result type. The following, for example:declare students(teacher) ->> string13

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 14describes a function mapping instances of the class teacher onto sets of strings givingthe names of their students. If, however, we wish to store more information aboutstudents than their names then we must have another entity class called student:declare student ->> entitydeclare surname(student) -> stringdeclare given name(student) -> stringdeclare age(student) -> integerkey of student is given name, surname;We now represent the relationship between a teacher and their students as a functionmapping from one entity class to another:declare students(teacher) ->> studentThis kind of function (i.e. a single-argument, entity-valued function) is called a rela-tionship function, in recognition of the fact that it represents one side of a relationshiprather than a scalar attribute of a class. The declaration of a relationship function alsoentails the declaration is its inverse, so that the relationship can be traversed in eitherdirection. Inverse function names are generated by appending the su�x \ inv" to thename of the forward function, e.g. the function de�ned as the inverse of the studentsfunction given above is:students inv(student) ->> teacherAll inverse functions are assumed to be multi-valued.One of the failings of the relational model that the newer semantic data modelssuch as the FDM address is the reliance on attribute-based keys for modelling relation-ships. In the relational model, the students(teacher) relationship would be modelledas a relation containing the keys of the teacher and student instances so related:teacher studentgiven name surname given name surnameThe problem with this approach is that we cannot guarantee that even such apparently�xed attributes as key attributes will never change. People can and do change their

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 15names, their membership numbers | even surrogates like National Insurance numberscan, under some circumstances, change. And under the relational model, every time akey attribute changes its value, the update must be propagated to every single relationin which that key attribute is involved.The solution, as proposed in semantic data models, is to generate internal sur-rogate identi�ers for each instance and to use these to model relationships [66]. Sincethese identi�ers are system generated they can be guaranteed never to change, and sincethey are an internal attribute, and are not directly accessible, users are encouraged toview relationships as links between instances rather than attribute values.In P/FDM, however, both styles of identi�cation are supported. While a surro-gate identi�er is used for all internal and stored links, we have found that an externalattribute-based key can be useful when loading data in bulk and also acts as an indexinto each class for data retrieval [87]. We have also found that the disadvantages ofan attribute-based identi�cation scheme (namely, the di�culties of having to change allreferences to an object whenever part of its key changes) disappear when external keysare used in conjunction with internal identi�ers. All references to objects are made usingthe internal identi�er (which can never change), thus leaving the attributes involved inthe external key free to change their values.The key for each class is speci�ed as part of the schema, and may consist of acombination of any of the single-valued functions de�ned on the class being de�ned.For example, the following schema fragment de�nes the key of the person class as thevalues of the surname and given name attributes:key of person is surname, given name;It is also possible to make use of relationship functions in keys, which has the e�ect ofincluding the whole key of the related class within the key of the class being de�ned.For example, we might de�ne the key of a class storing details of the articles publishedby a research group as the principal author of the article and the month in which it waspublished:

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 16declare published paper ->> entitydeclare principal author(published paper) -> persondeclare co authors(published paper) ->> person...declare month(published paper) -> stringkey of published paper is key of(principal author), month;Keys may be nested in this way to an arbitrary degree, and the expansion of keyde�nitions is performed recursively until the �nal key consists of a sequence of scalarvalues. In this example, the key of published paper expands to three values: thesurname of the principal author, the given name of the principal author and the monthof publication. This expansion is important in practice as it means that instances canbe retrieved by their key values in a single disk-access, without having to make extraaccesses to retrieve the identi�ers of the other instances involved in the key.When the key of one class is nested within the key of another in this way, wesay that the second class is key-dependent on the �rst, e.g. in the above schema, thepublished paper class is key-dependent on the person class. We interpret this rela-tionship as meaning that the existence of every instance of published paper is con-tingent on the existence of some instance of the person class. The result is a kind ofpart-component relationship, and much use is made of key-dependency in the proteindatabase in this guise. A protein, for example, consists of one or more chains, which inturn consist of several residues, each made up of a group of atoms. In P/FDM theserelationships are all represented as key-dependencies:declare protein ->> entitydeclare protein code(protein) -> string% : : : other propertieskey of protein is protein codedeclare chain ->> entitydeclare chain id(chain) -> stringdeclare component protein(chain) -> protein% : : : other propertieskey of chain is key of(component protein), chain iddeclare residue ->> entitydeclare position(residue) -> integerdeclare component chain(residue) -> chain% : : : other propertieskey of residue is key of(component chain), position;

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 17In fact, all of the information stored about a particular protein in our schema is key-dependent on that protein instance at some level or other, which ensures a navigationalpath from data representing any part of a particular protein to data representing anyother part of that protein.As in many semantic data models, entity classes in P/FDM may be arrangedinto inheritance hierarchies so that functions de�ned at higher levels in the hierarchymay be inherited by lower level classes. Thus, we can abstract the common attributesof the teacher and student classes into a person class:declare person ->> entitydeclare surname(person) -> stringdeclare given name(person) -> stringkey of person is given name, surname;and can then make our original classes subclasses of the person class:declare teacher ->> person % Subclass declarationdeclare subject(teacher) -> stringdeclare student ->> person % Subclass declarationdeclare age(student) -> integer;Notice that we declare the key only for the root class person. This ensures that the keyis the same for all classes in the hierarchy, and that it is de�ned in terms of functionsthat are inheritable by all classes in the hierarchy.personstudent researchsta� teachingsta�Figure 2.1: Sta�/student inheritance hierarchyEach class may have only one immediate superclass, i.e. multiple inheritance isnot supported. In practice, however, this is less of a restriction than might be supposed,

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 18due to the way in which P/FDM handles subclasses. Unlike many other data models,P/FDM supports overlapping subclasses, where an instance of a class may simultane-ously belong to any number of its subclasses. Consider, for example, the class hierarchygiven in Figure 2.1. This schema presents no problems as long as all of the people aboutwhich we have to store information can be categorised into these four classes, but howdo we cope with the not unreasonable situation that a member of research sta� is alsoenrolled as a student. Without overlapping subclasses, we would need to create a newsubclass of person to store the instances which have the dual role (see Figure 2.2(a)), orat best, in systems which support multiple inheritance, a new subclass of the studentand research classes (Figure 2.2). In either case, we would require one extra class foreach possible combination of subclasses. personstudent studentresearch researchsta� teachingsta�(a)personstudent researchsta� teachingsta�studentresearch (b)Figure 2.2: Modelling research sta� who are also students (a) using an extra subclassand (b) using multiple inheritance

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 19When overlapping subclasses are allowed, however, no extra classes are required,since instances with dual roles are simply modelled by instances which belong to which-ever of the subclasses are applicable to them. Unfortunately, overlapping subclasses cancause ambiguities for function binding, which are similar to those caused by multipleinheritance. Consider the schema shown in Figure 2.3, which de�nes a function yearon the class person and then overrides it for the classes student and research. If theuser requests the value of year for an instance which is a member of all these classes,which of the de�nitions should be used? According to the most commonly used bindingtechnique, late binding (or most-specialised binding), the de�nition to be used is thatwhich is de�ned on or inherited by the lowest level class of which the instance is amember. In our example, if the person is also a student then the de�nition of yearon the student class will be used. If the person instance is not a member of any of thesubclasses then it is the de�nition on the person class that will be used. If, however,we have a member of research sta� who is also a student then there is an ambiguity, asboth de�nitions at this level are equally applicable under dynamic binding. Because ofthis, P/FDM does not support dynamic binding, but instead provides an explicit castfacility so that users and programmers may specify exactly which de�nition is requiredin a particular situation. For example, if p is an instance of person, thenyear(p as student)binds to the year function de�ned on the student class, andyear(p as research)binds to the de�nition given for the research class. If no explicit cast is made then thede�nition that is used is that de�ned on or inherited by the class to which the requestfor the function value was �rst made.The P/FDM data model also supports a special kind of entity class called a valueentity, which acts rather like a tuple type. Value entities may have attributes, but theyare restricted to single- and scalar-valued functions, and they do not have a key. Valueentities may not be enumerated independently and they only have concrete existencein the database when they are linked to some instance of a full entity class. Theydo not have surrogate identi�ers, since they can always be uniquely identi�ed by the

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 20personstudent researchsta�yearyear yearFigure 2.3: Schema illustrating ambiguities of function binding under overlapping sub-classescombination of their attribute values. In fact, value entities are an immutable type, inthat changes to any of their attributes result not in a new version of the updated entitybut in a completely di�erent value entity. This behaviour is modelled on that of thebase types such as integer. One would not wish to enumerate the whole set of integers,since it is not a �nite set. Also, an integer is identi�ed completely by its (immutable)value. If we change the value of, say, the integer 5 to be 6, then we no longer have theinteger 5. Value entities are intended for the modelling of such compound but scalartypes as dates, times and spatial coordinates. In the protein database, value entitiesare used to model the atoms that make up the residues of chains:declare atom ->> value entity% Coordinatesdeclare x(atom) -> floatdeclare y(atom) -> floatdeclare z(atom) -> float% Solvent accessibilitydeclare accessibility(atom) -> float;And atoms are linked to residues by the following function (the second argument beingthe atom name):declare atom(residue, string) -> atom;The data about individual atom positions forms a large part of the protein database andstoring atoms as value entities saves us the considerable overhead of having to allocateand store full identi�ers and the structures associated with the enumeration of classesfor each one.

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 21The function which links residues to atoms de�ned above gives us an exampleof a function with more than one argument. P/FDM allows the de�nition of functionswith an arbitrary number of arguments, but does not maintain inverses for them. Multi-argument functions are considered, for the purposes of binding and inheritance, to bede�ned on the class that is given as the type of their �rst argument; so the atom functionis treated as an attribute of the residue class. Another example of a two-argumentfunction from the protein database is:declare absolutepos(chain, integer) -> residue;This function allows us to navigate directly from a chain to the residue at any position(given by the integer argument) along it, and is useful for identifying proteins with par-ticular types of residue at particular positions, for example. Similarly, the res by namefunction indexes residues by their names:declare res by name(chain, string) ->> residue;(The string argument here is the name of the residue type to be searched for.) Thisfunction can be used to study the composition of the chains of sets of proteins withoutrequiring multiple enumerations of the residue class.It is also possible to de�ne derived functions in P/FDM, which consist of a piece ofcode to compute the result of the function at run-time. Once de�ned, derived functionsappear to the user like any ordinary function, and it is not necessary to know whethera function is stored or derived to be able to retrieve its value. Here, for example, isa derived function which computes the length (in residues) of an element of secondarystructure, expressed in Daplex:define length(s in structure) -> integer in pdbend(s) - start(s) + 1;Derived functions are compiled into Prolog code, which is then stored in the databasewith the remainder of the metadata.The body of a derived function may be any legal Daplex expression computingeither a singleton (for single-valued functions) or a set (for multi-valued functions).Recursive de�nitions are allowed, as in this example which computes the transitiveclosure of the subpart relationship:

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 22declare part ->> entitydeclare subparts(part) ->> partdefine all subparts(p in part) ->> part in partdb(subparts(p) union all subparts(subparts(p)));Derived functions can be seen as a way of abbreviating or parametrising commonly usedcomputations on data, such as the length/1 example function1. It is also possible tode�ne methods that parametrise queries and updates in P/FDM, as action methods.Here is an example action that populates the absolutepos function for a given chain,again given in Daplex:define populate absolutepos(c in chain) in pdbfor each r in has chain inv(c)let absolutepos(c, pos(r)) = r;This method iterates over the set of residues associated with the given chain (the foreach construct) and sets the value of absolutepos for each position (the let construct).Displaying values is also considered to be a side-e�ecting action and ordinaryqueries may be parametrised and made to persist by turning them into actions. Hereis a query to print the positions of a particular residue type within a particular proteinchain:define where is residue(c in chain, s in string) in pdbprint(pos(res by name(c, s)));Notice that the result of composing a multi-valued function with a single-valued functionis always multi-valued. Thus, the argument to the print construct is a set, and thebehaviour of this action is equivalent to:for each r in res by name(c, s)print(pos(r));1Throughout this thesis, we use the Prolog convention of identifying functions and predicates bygiving both their name and arity, so that length/1 here indicates the function called length which hasone argument.

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 23This distinction, between methods which side e�ect (i.e. actions) and methodswhich don't and which are therefore referentially transparent (i.e. functions), is impor-tant in P/FDM, as it clearly separates those parts of the system which may be treatedfunctionally and those which may not.In P/FDM, large databases are partitioned into smaller units called modules.Each module contains the schema and data for a conceptually-related subset of the fulldatabase. The protein structure database, for example, is divided into three separatemodules, one storing residue and atom level protein structures, another storing higherlevel structural information and a third storing basic biochemical information that isapplicable to all proteins. In addition to these central modules, individual applicationsde�ne further modules storing data particular to themselves. The protein modellingapplication, for example, uses a separate module in which to store the working datafor protein models. In order to support these di�erent types of data usage, P/FDMprovides three types of module: shared, private and temporary. Shared modules havetwo access modes | shared reading and exclusive writing | and are intended for thestorage of relatively static, generally applicable information that will be shared amongstseveral applications. Private modules are particular to individual users of individualapplications and are therefore accessible only by that user, whether for reading or forwriting. Subject to certain restrictions [58], they may contain links to the main corpus ofdata in the shared modules. Protein modelling modules, for example, contain referencesto the proteins in the shared pdb modules which form the basis of the models beingconstructed.The �nal type of module is the temporary module, for which only one access mode(exclusive writing) is available. Temporary modules are in-memory modules, intendedfor storing temporary working data that is not required to persist. Temporary modulesare created as soon as they are opened and destroyed as soon as they are closed.This approach to the partitioning of data seems to suit the types of scien-ti�c/design applications for which P/FDM was designed. These applications typicallyinvolve a central body of incrementally expanding general knowledge plus severals setsof more dynamic, application-speci�c knowledge. As a locking strategy, this approachis rather simpler than many suggested for design-type databases [4], but we have foundthat it works well in practice. Individual users access only those subsets of the data

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 24which are applicable to their needs, and may update their own data without preventingothers from accessing the all-important central modules. Temporary modules mean thatusers can experiment in their own workspace, without having to consider the long-terme�ects of these experiments on their persistent data.2.2 The Architecture of P/FDMThe data model described above is implemented by a set of primitives that specify theoperations that may be performed upon its elements. There are primitives, for example,to create, delete and retrieve data, and other primitives to handle data de�nition and tocontrol access to modules. Although the primitives de�ne a functional data model, theyare themselves implemented in Prolog and can be incorporated into Prolog programslike ordinary predicates, to express queries or updates on the database.Prolog, then, is the primary means of manipulating P/FDM databases, but,while it is a very powerful and exible data manipulation language, it is too low-levelto be suitable as a general-purpose query language for casual users. Several higher-levelinterfaces have been implemented, therefore, as utility programs that operate on topof the Prolog-level primitives. The main utility program is a compiler for the Daplexdata manipulation and de�nition languages [98] described earlier, but there are otherspecial-purpose utilities such as a bulk loader [87], which simpli�es the loading of largeamounts of raw data into databases. These utilities are also implemented in Prologand, like the user and the application programs, they access the database only via theprimitive operations. Daplex programs are compiled into Prolog programs containingcalls to the data manipulation primitives, and the bulk loader interprets its input andmakes the relevant calls to the data creation primitives.However, by virtue of their status as trusted \system" utilities, these programsare allowed to access the internal metadata structures directly, rather than having toaccess them via the primitives. This concession is made on the grounds of the in-creased e�ciency that direct access to metadata provides, and it is further discussed inSection 2.3, where the various ways of accessing metadata in P/FDM are described.

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 25Below the level of the interface formed by the primitives, there are several \inter-nal" de�nitions for each primitive that implement the operation it represents in termsof di�erent storage types. For instance, P/FDM currently supports four di�erent typesof storage | hash �les, the Sybase relational database, the Prolog clause base and themetadata storage structures | so for each primitive there will be four internal de�-nitions. The routine that the user invokes as a primitive does not, in fact, performthe operation requested itself. Instead, it merely decides on the storage type that isappropriate to the data model element being operated on and then invokes the internalprimitive de�nition for that storage type to perform the operation. We term the prim-itive routines that carry out this process of binding to the correct internal de�nitiondriver primitives, in order to di�erentiate them from the storage-type speci�c internalprimitives that implement the actual operations.The storage type that is applicable to a database object can be found by queryingthe metadata for the storage type of the module in which the object was de�ned. Hereis the general form for driver primitives binding to operations on instances of entityclasses:Primitive(Class, OtherArgs : : :) :-find metadata for class(Class, Metadata),find module type(Class, MType),internal Primitive(MType, Metadata, Class, OtherArgs : : :).where Primitive is the name of the primitive operation being de�ned, and OtherArgsare the (input and output) arguments speci�c to that operation. The form of driverprimitives operating on functions and actions is similar, except that the module ofde�nition depends on both the function/action name and the class on which it is de�ned(i.e. the type of the �rst argument).Notice that the driver primitive adds two extra arguments, the module type anda metadata descriptor, to the call to the internal primitives, and passes the originalarguments through unchanged. The internal primitive de�nitions give, as their �rstargument, an atom representing the type of storage structure on which they operate.E.g.

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 26internal Primitive(hash, Metadata, Class, : : :) :-: : :internal Primitive(sybase, Metadata, Class, : : :) :-: : :internal Primitive(temporary, Metadata, Class, : : :) :-: : :internal Primitive(metadata, Metadata, Class, : : :) :-: : :The routine that returns the module type for a particular schema element is determin-istic and succeeds once, instantiating the MType variable, or not at all. So the driverprimitive will bind either to one module type or to none, regardless of the ordering ofthe internal de�nitions within the clause base, and the resulting behaviour is alwayspredictable.The overall architecture of P/FDM is illustrated in Figure 2.4. As the diagramshows, it is organised in layers, with each level building on the layers beneath it. Thearchitecture owes much to the ANSI/SPARC 3-Level Schema [30], which proposed thatthe logical (or conceptual) schema, describing how the data is structured and related,is separate from the storage (or internal) schema, describing how the data are actuallystored in terms of �le structures, indexes and so on. The purpose of this separationis to insulate application programs from changes to the physical representation of thedata on which they operate, and thus allow the incorporation of new and more e�cientdata management techniques into the DBMS without requiring the recompilation of alluser code. The third level of the ANSI/SPARC model is the subschema (or externalschema) level, which de�nes individual user or application views onto the conceptualschema, thus providing some level of insulation from changes to the conceptual schema.The P/FDM architecture also maintains a clear separation between the concep-tual and the storage schemas, although it does not, as yet, have a view mechanism.However, the division of the conceptual schema into modules allows us to take theidea further to allow several storage schemas to be in use simultaneously. The driverprimitives provide a uniform interface onto the conceptual schema, regardless of the

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 27APPLICATIONS CASUAL USERSSYSTEM UTILITIESDRIVER PRIMITIVESSYBASE HASH TEMP. METADATASybaseStore GDBMStore PrologClauseBase MetadataDescriptorsFigure 2.4: The architecture of P/FDM

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 28underlying storage mechanism, and users may navigate from, for example, data storedin a relational database to data stored in hash �les, without being aware of the changeof storage schema in mid-query.In fact, the extensions described in this thesis go one step further still in exploitingthis independence of the conceptual and internal schemas, in that they de�ne newstorage types (e.g. transaction) in terms of other existing storage types, invisiblybeneath a single conceptual schema.2.3 Metadata Structure and AccessMetadata for P/FDM modules is represented as a set of unit ground clauses known asdescriptors. There is one type of descriptor for each of the �ve principal types of datamodel element:edesc/7 for entity classes and value entities,fdesc/8 for both stored and derived functions,adesc/4 for action methods,mdesc/5 for modules andsdesc/1 for the scalar types.These clauses form the bulk of the metadata and they describe the current state ofa database schema, as de�ned in the Daplex DDL. The metadata also contains someauxiliary terms maintaining, for example, statistical information for the Daplex queryoptimiser [60], and information on the access modes of any modules that have beenopened. It is the descriptors, however, that are most relevant to the work that ispresented here.Entity clauses and value entities are both represented by the same type of de-scriptor; this is possible because, in terms of metadata, a value entity is simply a morerestricted form of entity class. The form of an entity descriptor is:edesc(ClassName, SuperClass, EntityType, KeyDesc, NumInsts,LastId, Module)

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 29The �rst argument, ClassName, is the name of the class for which this is the descriptor.We require that entity names are unique, not only within a module but across allmodules that might be accessed together. In other words, the name of a class must besu�cient to uniquely identify its descriptor.SuperClass stores the name of the immediate superclass of the class being de-scribed, and thus the descriptor represents a single is-a link in an inheritance hierarchy.Root classes (i.e. those at the top of their inheritance hierarchy) store the atom entityas their superclass.The function of the EntityType �eld varies, depending on whether the descriptorrepresents an entity class or a value entity. In fact, it is this �eld that distinguishesbetween the two types of descriptor. All descriptors representing value entities have theatom value as their entity type, while all full entity descriptors record the overall typeof their key in this �eld. Thus, an entity class with an integer key will have the valueinteger here, and a class with a string key or a key of mixed type will have the valuestring.The KeyDesc argument is a list of the functions that make up the key of eachentity class. The elements of the list are either the name of a function or a term of theform foreign(FName), whose argument is also the name of a function. The foreign/1term speci�es the inclusion of a foreign key within the key of the current entity class,and indicates that FName is a relationship function pointing to some class on which thecurrent class is key dependent. Because P/FDM allows a certain amount of overloadingon function names, a name in itself is not usually su�cient to identify a particularfunction de�nition, and we would generally require both the name and the class onwhich the function is de�ned for an unambiguous reference. When describing keys,however, we know that the key functions must be de�ned on the entity class beingdescribed (or, for a subclass, on the class at the root of its hierarchy) so we have all theinformation we require. Value entities can also have a key descriptor, since while theydo not have explicitly declared keys, the DBMS automatically de�nes their keys to bethe concatenation of all their attributes.NumInsts and LastId are applicable only to full entity classes. The former storesthe cardinality of the class, and is used for instance retrieval [88] and by the Daplex queryoptimiser [61] for its cost calculations. LastId is used in generating system identi�ers

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 30for new instances. Currently, in P/FDM, identi�ers take the form of a Prolog term withthe class name as a functor and a unique integer (unique, that is, within each particularclass) as its single argument. LastId is a record of the last integer identi�er allocatedwithin this class hierarchy, and the next available identi�er can be calculated as LastId+ 1. For extra security, we do not reuse the identi�ers of instances that have beendeleted, and therefore need store only this single value in order to be able to generatenew identi�ers.The �nal piece of information that is stored in an entity descriptor is the nameof the module in which the class or value entity was de�ned. As we saw in Section 2.2,the module of de�nition is used by the driver primitives to bind to the correct internalprimitive. It is also used within the internal primitives as a pointer to the physical �lesin which the entity's details are stored.An example descriptor, for the student class de�ned in Section 2.1, is:edesc(student, person, string, [cname, sname], 0, 0, unidb).Function descriptors store their information in the following form:fdesc(FName, ArgTypes, ResultType, Cardinality, FunctionType,HasInverse, FunctionId, Module).The �rst argument gives the name of the function and the next two arguments specify alist of the argument types and the result type respectively. Cardinality distinguishessingle-valued from multi-valued functions, and will either have the value single ormulti.The FunctionType can be one of four values: method, key, special or optional.Functions of type method are derived functions, and will have a Prolog method with thefunctor FName stored in the metadata as their de�nition. The remaining three types allrefer to stored functions and indicate a function that is part of a key (key), a functionthat is the inverse of some key function (special) and an ordinary non-key function(optional) respectively. The optional type indicates that the function need not bede�ned (i.e. it is a partial function), unlike the key functions which must be given avalue when an instance is created (i.e. they are total functions).HasInverse is a ag indicating whether the system was able to de�ne an inversefor the function (has inverse) or not (no inverse).

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 31The FunctionId is an integer identi�er that is allocated to each function onde�nition, as a more concise alternative to the often unwieldy combination of functionname and �rst argument type. The identi�er is unique within a particular module andis intended for use at the storage schema level, where the lengthier identi�er would beine�cient both in terms of space and speed of manipulation.Finally, as with entity descriptors, we store the name of the module in which thefunction was de�ned. Some example functions are:fdesc(surname, [person], string, single, key, 2, no inverse, unidb).fdesc(students, [teacher], student, multi, optional, 4, has inverse,unidb).fdesc(absolutepos, [chain, integer], residue, single, optional, 5,no inverse, pdb1).Action descriptors are a cut-down form of function descriptor, with the informa-tion about results and stored functions removed:adesc(AName, ArgTypes, ActionId, Module).so that we store simply the name, argument types, identi�er and module of de�nitionfor each action. The descriptor for the action called populate absolutepos de�nedabove, for example, is:adesc(populate absolutepos, [chain], 1, pdb1).The descriptor storing metadata about modules is the mdesc/5:mdesc(MName, ModuleType, Status, LastFId, LastAId).Here, MName contains the name of the module and LastFId and LastAId contain the lastintegers allocated as identi�ers for functions and actions respectively (cf. the LastId�eld of the entity descriptors). The ModuleType �eld contains an atom indicating thestorage type of the module. In general, this will be an atom that is also used as the �rstargument to some set of internal primitive de�nitions but this is not compulsory. If adriver primitive cannot �nd a suitable internal primitive, then it simply fails quietly.

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 32The Status of a module describes the access modes available. Three kinds ofaccess are available, as described in Section 2.1 { shared, private and temporary { andthis set is reduced for certain module storage types. Temporary modules, for instance,may only be created with a status of temporary, while Sybase modules are assumed tohave shared status.The �nal type of descriptor, sdesc/1, is the simplest of all:sdesc(TypeName).storing only the name of a scalar type supported by the DBMS. The three scalar typesavailable in P/FDM (string, integer and float) are not de�ned in any particularmodule but are pervasive types that exist within the DBMS itself, and therefore requireno module of de�nition �eld.2.3.1 System Access to MetadataThe metadata descriptors contain information which describes both the logical struc-ture of the database modules (i.e. their conceptual schemas) and some details of thephysical storage of the data (i.e. their storage schemas). The primitives make use of theconceptual schema information in order to check that the operation requested will notbreak any of the semantic rules of the data model and then use the information aboutthe storage schema in order to be able to carry the operation out. E�cient access toboth these types of metadata, then, is an important criterion for e�cient access to rawdata, and the metadata structures described above have been designed with this fact inmind. The most obvious concession to e�ciency is the use of n-ary relations for de-scriptors, rather than the binary relations on which the functional data model is based.A binary-relational approach would allow us to add new attributes to the metadatawithout a�ecting existing code, but n-ary relations give us the ability to extract asmuch information as we need from each descriptor with only a single retrieval. Wecan also pass complete descriptors back and forth as parameters within the internalprimitives, with each routine accessing only the parts of the descriptor that it requires,rather than having to pass the identifying attributes and using these to retrieve theother attributes afresh within each routine. For the retrieval primitives, this can mean

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 33that we require only one retrieval from metadata in order to retrieve a complete setof instances or values. The more complicated update primitives will generally requiremore than one retrieval from metadata, but the overall saving in execution time canstill be considerable.To further improve e�ciency of access, metadata clauses are loaded into theProlog clause base whenever a module is opened. All requests for information from themetadata are made to these in-memory copies, as are all updates. When a module isclosed, the new forms of the descriptors are written back out to the database. (Notethat, since the Prolog de�nitions of function and action methods are also consideredto be a part of the metadata, they are also cached in memory at the start of eachsession.) In fact, this arrangement has another signi�cant advantage in that it uni�esmetadata access for all storage types. No matter how the metadata may actually bestored within a module, it always takes the form of the descriptors described above inthe Prolog clause base when accessed by the primitives. Any conversion between thepersistent and in-memory structures is con�ned to the primitives handling the openingand closing of modules, and thus neither obscures nor slows down access to metadataduring ordinary data manipulation.Storing metadata descriptors as Prolog clauses allows us to take advantage ofthe facilities provided by Prolog for their e�cient retrieval. Versions of P/FDM existfor two Prolog compilers | Quintus Prolog [94] and SICStus Prolog [103] | both ofwhich provide an index on the �rst argument of clauses with the same functor and arity(or, to be precise, on the principal functor of their �rst argument). This knowledge hasbeen exploited during the design of the metadata descriptors so that the �rst argumentis always used to store the attribute that identi�es the descriptor most precisely. Entitydescriptors, for example, can be identi�ed uniquely by the names of the class theyrepresent and can therefore be retrieved directly, using the index on their �rst argument.Function and action descriptors are not identi�ed uniquely by their names, as we haveseen, but the indexing facility is still useful in narrowing down the set of descriptorsthat must be searched for a complete match | especially in view of the fact that, inpractice, the majority of function and action names will not be overloaded and will beable to identify a single descriptor.

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 342.3.2 External Access to MetadataIn order to allow wider, external access to metadata, P/FDM provides an interface tothe metadata descriptors that mimics the behaviour of a temporary module [38]. Theinterface makes it possible to query the metadata using the query primitives providedfor raw data retrieval, whilst at the same time guarding against unauthorised updates,which may corrupt the metadata. This uniform view onto the metadata is implementedby treating the metadata descriptors as an additional storage schema. Internal de�ni-tions are provided for the retrieval primitives that can convert the n-ary relations ofthe metadata to the binary relations of the FDM, and for the update primitives thatprevent external updates to the metadata.When the user enters the P/FDM system, a metadata module is automaticallyopened in exclusive read mode. As application modules are opened and closed through-out the session, metadata descriptors are created and deleted in the Prolog clause base,and the contents of the metadata module will apparently be updated to re ect thesechanges. The user (or an application program) can query the metadata using the re-trieval primitives, either directly from Prolog or indirectly from Daplex, just as they canquery their application data. Moreover, because of this apparent uniformity of meta-data access, any general-purpose program that can operate on an arbitrary schema (anon-screen browser, for example) will also automatically be able to operate on metadata.The schema for the metadata module is illustrated diagrammatically in Fig-ure 2.5. The full version, expressed in the Daplex DDL is given in Appendix C. Theentity classes in Figure 2.5 correspond (roughly) to each of the various kinds of metadatadescriptor, and the individual descriptors themselves are treated as instances of thesemetadata classes. So, for example, each function descriptor constitutes an instance ofthe class funmeta, and each module descriptor an instance of the class modmeta.Identi�ers for these pseudo-instances are generated from the metadata class nameand the identifying attributes of the corresponding descriptor. For example, the identi-�er for the instance of the metadata class entmeta describing the application level classprotein is: entmeta(meta id(protein)).

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 35actmeta funmetamodmeta objmetacompoundmeta simplemetaentmeta valentmetacmoduleamodule fmodule

act argskey component fun argsresult type

supertypeFigure 2.5: Diagrammatic representation of the metadata schema

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 36The values of the identifying attributes are wrapped inside a further term, meta id, toensure that all metadata identi�ers have only a single argument, including identi�ersfor those classes (i.e. funmeta and actmeta) which require two attributes to uniquelyidentify their descriptors. Thus metadata identi�ers have the same general form as non-metadata instance identi�ers, but use the attribute-based identi�ers of the underlyingdescriptors as the argument to the term, rather than the integers used for ordinaryidenti�ers. This means that we can generate an identi�er for a particular descriptorpurely from the information it contains, without having to refer to in-memory tablesmapping integer identi�ers to descriptors. The funmeta instance corresponding to theposition function de�ned on the residue class, for example, is:funmeta(meta id(pos, residue)).The attributes and relationships de�ned on the metadata classes correspond to thevarious �elds of the metadata descriptors. So, for example, the result of the functionfname de�ned on the funmeta class is found by extracting the �rst argument from theappropriate function descriptor. Where the function represents a relationship (i.e. theresult is an instance of some class), the result is found by extracting the data fromthe relevant �eld of a descriptor and converting it into a metadata identi�er. Theresult type of the pos(residue) function, for example, as retrieved from the followingdescriptor:fdesc(pos, [residue], integer, single, optional, 26,no inverse, pdb2).would beobjmeta(meta id(integer))The metadata schema also de�nes some function methods, describing other relationshipsbetween the descriptors, that are useful for querying metadata2. For example, thefollowing function returns the set of functions which are de�ned on a given entity class:define functions on(o in objmeta) ->> funmeta in metadataf in funmeta such that o in fun args(f);2See the Daplex version of the metadata schema, given in Appendix C, for the de�nitions of thesemethods.

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 37and, similarly, functions yielding/1 returns the set of functions that return instancesof a given type:define functions yielding(o in objmeta) ->> funmeta in metadataf in funmeta such that result type(f) = o;These functions can be incorporated into Daplex and Prolog programs just like anyordinary method function. Here, for example, is a Daplex query to return the signaturesof all functions de�ned on the class protein:for the e in entmeta such that oname(e) = ``protein''for each f in functions on(e)print(fname(f), `:', oname(arg types(f)), '->', oname(result type(f)));and here is a query to print the names of all classes that are key-dependent on someother class:for each e in entmetafor each f in key components(e) such thatresult type(f) is an entmetaprint(oname(e), 'is key-dependent on', oname(result type(f)));2.4 The Primitives of P/FDMIn order to de�ne a new module type the database developer must provide internalde�nitions for those primitives that are appropriate to the storage schema being imple-mented. We would normally expect the open module/2 and close module/1 primitivesto be implemented for all module types (although this is not enforced in any way) butotherwise there are no restrictions on the primitives that must be de�ned. It is alsopossible to de�ne completely new primitives (such as the begin transaction/0 prim-itive required for the transaction module type), in which case the programmer isresponsible for providing the new driver primitives as well as the internal de�nitions.Whichever primitives are to be de�ned for a new module type, however, it isvital that the resulting behaviour of the DBMS correspond to the semantics of the datamodel it implements, and that the behaviour of each primitive is consistent over allmodule types for which it is implemented. Thus, we would expect all internal de�nitions

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 38of the primitive that retrieves instances to return instance identi�ers without makingany updates to the database, and all internal de�nitions for the primitive that deletesinstances to maintain key dependency links in the data. The P/FDM system, at present,o�ers no support to developers of new module types in this respect, and it is thereforevital that the semantics of the primitives to be rede�ned are well understood. Theremainder of this chapter gives an informal description of the semantics of each ofthe data manipulation primitives currently supported by P/FDM. Brief details of theremaining primitives (i.e. the data de�nition primitives and the module managementprimitives) are given in Appendix A.2.4.1 The Data Retrieval PrimitivesInstance Retrievalgetentity(+Class, ?InstId).getentity(+Class, +KeyValues, ?InstId).3Two primitives are provided to handle the enumeration of entity classes, of which thesimplest is getentity/2. This primitive takes the name of an entity class and returnsthe identi�ers of successive instances of that class on backtracking (i.e. the primitivesucceeds once for every instance in the class). As the mode declaration above suggests,it can also be used to test whether a particular instance exists by giving its identi�eras the \return" parameter. The order in which instances are returned is unspeci�ed.The only constraint on the use of getentity/2 is that Class should exist and that itshould be the name of a full entity class, and not a value entity class (since value entitiescannot be enumerated).Getentity/3 is a variant of this primitive that provides access to instances bytheir keys. Unlike getentity/2, this primitive succeeds at most once, in which caseInstId is bound to the identi�er of the instance whose key is given as KeyValues. Thekey must be speci�ed as a fully expanded list of scalar values | i.e. instance identi�ersare not allowed. If no instance with the given key exists, then the primitive fails, leavingInstId uninstantiated. As with getentity/2, this primitive may be used to test thatan instance exists and has the given key, as well as for generating identi�ers.

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 39Function Value Retrievalgetfnval(+Function, +Arguments, ?Result).The getfnval/3 primitive maps a list of argument values onto either a single resultvalue or a set of results. When Function refers to a single-valued function then theprimitive either succeeds, instantiating the Result variable, or fails, indicating that thefunction is unde�ned for the given arguments. When Function is multi-valued, on theother hand, the results set is enumerated one by one on backtracking (cf. enumerationof instances by getentity/2), and failure signi�es not \unde�ned" but an empty resultset. Because of this, all multi-valued functions are assumed to be total.This primitive returns results from both stored and derived functions, so that theuser does not need to be aware of the implementation of a function in order to be ableto make use of it. Function de�nitions are stored as text strings representing Prologclauses, and are loaded into the Prolog clause base at the start of each session, readyfor use. These clauses take the following general form:<function name>(ListOfArgumentTypes, ArgumentVars, ResultVar) :-<function body>.The �rst argument, i.e. the list of types on which the function is de�ned, di�erentiatesthe clauses for overridden functions and functions with overloaded names.Generic Retrieval Primitivesderive key(+Class, +InstId, ?KeyValues).relative(+DestClass, +InstId, ?DestInstId).subordinate(+SubClass, +InstId, ?SubInstId).Several extra retrieval primitives are provided that de�ne useful relationships in terms ofthe basic retrieval primitives described above. These primitives are termed generic sincetheir internal de�nition is independent of any particular storage schema. Special internalde�nitions may be supplied if the storage schema they implement o�ers opportunitiesfor signi�cant optimisations, but in general they will not be required.

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 40Brie y, the generic primitives behave as follows. The derive key/3 primitiveperforms the inverse operation to getentity/3 in that it maps an instance onto its keyvalues, i.e. the predicate:getentity(Class, Key, InstId) () derive key(Class, InstId, Key).is always satis�ed by the database. This primitive operates by examining the meta-data and extracting the names of the key functions for the given class, and then usinggetfnval/3 to retrieve the key values. The process is complicated by the presence ofkey dependencies, since the key of the instance on which InstId depends must also beextracted (using a recursive call to derive key/3), and included in the key of InstId.The relative/3 primitive casts instances at one level of an instance hierarchyinto their equivalent instances at another, either above, below or at the same level asthe original class. If, for example, an instance of the class person, person(3), is alsoan instance of the class student, the call:| ?- relative(student, person(3), Student).will instantiate Student to student(3). If the given instance is not a member ofthe destination class then the primitive fails. There is a variant on this primitive,subordinate/3, which optimises the de�nition of relative/3 for the case when thedestination class is an immediate subclass of the class to which the given instance,InstId, belongs. Both these primitives are de�ned purely in terms of metadata retrievaland the getentity/2 primitive, hence their generic nature.2.4.2 The Update PrimitivesThe semantics of the update primitives are signi�cantly more complicated than thesemantics of the retrieval primitives. This is partly because of a change of emphasisfrom e�ciency of operation to security, but is mainly due to the number of structuralconstraints embedded within the data model that must be preserved by update actions.Key-dependency, contiguity of instance hierarchies, maintenance of inverses, totality ofkey functions and their inverses, uniqueness of keys | all these properties must bemaintained, usually by the propagation of the initial update to related objects that maybe a�ected by it. It is this propagation that complicates the behaviour of the update

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 41primitives, and the implementation of any system extensions that involve updates. Ourmain focus, therefore, in describing the behaviour of these primitives is on the variouspropagations that each one demands.Instance Creationnewentity(+Class, +KeyValues, -InstId).This primitive creates a new instance of the given class with the given key values, andreturns its newly allocated identi�er. If Class does not exist, or if it already containsan instance with the given key, the primitive fails, without making any updates.Instance creation results in the following propagations: �rst, all super-instancesmust be created so as to build a contiguous instance hierarchy. Once the root-instancehas been created, we can then populate the key functions. For scalar-valued keyfunctions, this means a straightforward call to the function value addition primitive,addfnval/3. Relationship key functions, on the other hand, require a more compli-cated treatment, since a key-dependency relationship is involved. In this case, the keyof the instance on which the new instance depends is extracted from KeyValues andused to retrieve its identi�er (via getentity/3). Having veri�ed that the instance ex-ists, we can then use the identi�er to populate the relationship key function (again usingaddfnval/3). If no such instance exists then the key dependency property is violatedby the instance creation, and the primitive fails with an error message.As an example, consider the creation of a new �-helix in the protein database(module pdb1), which involves all these propagations. The call:newentity(alpha, [p1crn, 'A', h1], Helix).results in the following sequence of propagations: �rst, a new instance of the classalpha is created. It is not a root class, so the update is propagated to the immediatesuperclass, in this case helix. A new instance is also created at this level, and theupdate is further propagated to the root class structure. An instance is created atthis level, producing a contiguous instance hierarchy, and the key functions can now bepopulated. The key of the structure class is de�ned to be:key of(structure chain), structure name

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 42chain[p1crn, A] structure \h1"helixalphastructure chain3 structure name421Figure 2.6: Propagation of updates from the creation of an �-helixwhere structure chain/1 is a single-valued function mapping to the chain class, andstructure name/1 maps to a single string value. The �rst part of the key indicates akey-dependency, so we extract the key values for the chain instance from KeyValues(a protein code and a chain identi�er) and attempt to retrieve it:getentity(chain, [p1crn, 'A'], Chain).If this fails, then the instance does not exist and the entire primitive call fails, withan error message. If the getentity succeeds, on the otherhand, we can complete theupdate by populating the two key functions, with structure chain/1 taking the valuereturned as Chain and structure name/1 taking the value `h1', given in KeyValues.Figure 2.6 illustrates the �nal results of the creation.Instance Deletiondeletentity(+Class, +InstId).The deletentity/2 primitive takes as its arguments a class name and the identi�er ofsome instance of that class that is to be deleted. Both the class and the instance mustexist for the update to succeed.

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 43This apparently simple operation is the most complicated, in terms of propaga-tion, of all the primitives, since everything that is directly associated with the instancethat is being deleted must also be deleted. The process begins by deleting all functionsde�ned on Inst, including both scalar-valued attribute functions and relationships.Unfortunately, it is also necessary to check the database for less obvious references toInst. These could be multiple-argument functions which either return instances ofClass or have Class as one of their argument types; or relationship functions whichreturn instances of Class but which do not have inverses. The only way to identify thesefunctions is to make a complete scan of the entire database, looking for occurrences ofInstId. This is a lengthy process, but cannot be avoided if functions of these typesexist. Having dealt with the functions, it is also necessary to delete any instances whichdepend upon InstId. All sub-instances must be deleted, as must all instances that arekey-dependent upon InstId. And, of course, each of these deletions will recursivelytrigger further deletions of functions and dependent instances, until all references toInstId and its dependents have been deleted.This cascading of deletions can be very useful to users in that they can rely onthe DBMS to do all the tidying up after their deletions. In the protein database, forexample, where all the information about a particular protein is key-dependent at somelevel or other on a single instance of the class protein, the user can remove all thedetails of that protein by a single call to deletentity/2. On the other hand, this doesmean that it is important that users understand the potential consequences of deletionsbefore they make them, since the complexity of propagations caused by even a singledeletion can make it very hard to correct mistakes.Instance Hierarchy Inclusioninclude(+Class, +InstId, -ClassInstId)The include/3 primitive allows instance hierarchies to be extended by creating a newinstance of Class and linking it to the instance hierarchy of which InstId is a member.The identi�er of the new instance is returned as ClassInstId. We can use include/3,for example, to mark the fact that a particular person has become a student:

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 44include(student, Person, Student).without having to delete the person instance and all its associated relationships, andthen replacing everything based on a new student instance.Subclass inclusion is a relatively straightforward operation, being simply a cut-down version of newentity/3. The instance creation is propagated to superclasses, aswith the more general creation primitive, but for include/3 the propagation ceaseswhen we encounter a superclass in which the instance already exists. Since we knowthat some part of the hierarchy is already populated (at least to the level of InstId),we know that this propagation will always stop before we reach the root class, and thattherefore we will never be required to populate key functions as a result of an inclusion.There is no equivalent exclude/2 primitive, for reducing the class membershipof an instance hierarchy, since the deletentity/2 primitive can perform this operation.To undo, for example, the e�ects of our inclusion of Person into the student class, wecan say: deletentity(student, Student).Since deletentity/2 propagates itself to sub-instances, and not to super-instances, ouroriginal Person instance is left intact.Function Value Additionaddfnval(+Function, +Arguments, +Result).This primitive adds a mapping from Arguments to Result, to the function calledFunction. The exact behaviour depends upon whether the function is single- or multi-valued. If the function is single-valued and as yet unde�ned, a call to addfnval/3de�nes its result. If the function already has a result then the primitive fails, givingan error message. If, however, the function is multi-valued then the call to addfnval/3is interpreted as requesting that Result be included in the set of function results, andso will always succeed. Notice that the results are stored as a set and therefore do notcontain duplicates. Any attempt to add a duplicate result to a function simply succeedsquietly, without making any updates.

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 45It is not possible to add values to method functions (which cannot be updatedin any way other than by changing their de�nition) or key functions (which are all totaland single-valued anyway). The only propagation required by addfnval/3 is whenrelationship functions are being populated, in which case the inverse function must alsobe populated:addfnval(InvFunction, [Result], Argument).Function Value Deletiondeletefnval(+Function, +Arguments, +Result).The deletefnval/3 primitive implements the deletion of function result values. Fornon-key, stored functions the behaviour is straightforward and requires no propagation,except to inverse functions where they exist. The user is not allowed to delete keyfunction values (since this would create an instance with an empty or incomplete key)and the primitive fails with an error message if this is attempted.The only type of function value deletion that can trigger any signi�cant propaga-tion is the deletion of an inverse key function's value, which causes deletion of the resultinstance. The reason for this seemingly drastic action is that the deletion of an inversekey function indicates deletion of an instance on which Result is key-dependent. Con-sider, for example, the relationship between a protein instance and a chain instance,as illustrated in Figure 2.7. A call to deletefnval/3 to delete the inverse key function'svalue:deletefnval(has protein inv, [Protein], Chain).(assuming, of course, that both Protein and Chain have been instantiated appropri-ately) destroys the link which records the key dependency and the deletion is thereforepropagated to the Chain instance itself:deletentity(chain, Chain).

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 46protein chainhas proteinhas protein invkey = pcode key = key of(has protein),chain idFigure 2.7: Key-dependency relationship between chains and proteinsFunction Value Updateupdatefnval(+Function, +Arguments, +OldResult, +NewResult).The updatefnval/4 primitive allows an existing result of a function (OldResult) tobe replaced by a new value (NewResult). Where the function to be updated is single-valued, the result before the update must be OldResult, and the result after the updatebecomes NewResult. If the original result was some other value then the primitive failswith an error message.For an update to a multi-valued function to succeed, OldResultmust be a mem-ber of the results set, in which case the function is rede�ned to have the value:(OriginalResults - fOldResultg) [ fNewResultgIf the function has an inverse, then we must maintain its integrity by deleting the inversemapping from OldResult and creating a new mapping from NewResult, i.e. the callupdatefnval(Function, [Argument], ORes, NRes)propagates todeletefnval(InvFunction, [ORes], Argument),addfnval(InvFunction, [NRes], Argument).As might be expected, the update of key function values is a special case that requirescareful handling. In the �rst place, we must ensure that key uniqueness is maintained,and that the update will not transform the key of the argument instance into the keyof another existing instance. In the second place, we must propagate the update of keyfunction values to any special storage structures that are used to implement entity enu-meration by key. Exactly what this entails depends largely upon the individual storage

CHAPTER 2. THE PROLOG/FUNCTIONAL DATA MODEL 47schemas: hash type modules store key indexes in a separate database �le, for exam-ple, which must be updated when key function values are changed, whereas modulesbased on the Sybase relational database system make use of the automatic key indexingfacilities provided at this lower level and therefore require no direct updates.Further propagations to key structures may be required if the argument instanceis involved in any key dependency relationships. When scalar key functions are updatedand key dependent objects exist, then the change to the key of the argument instancemust be propagated to the keys of its dependent instances. Consider the chain/helixexample given above, where the keys of a chain instance and a key dependent helixinstance are:chain(14) ['p1crn', 'A'] andhelix(3) ['p1crn', 'A', 'h1']If the chain id of the chain instance is updated:updatefnval(chain id, [chain(14)], 'A', ' ').then the key of the helix instance must become['p1crn', ' ', 'h1']Notice that it is only the key itself that changes, not the key attributes of the helixinstance. The value of the structure chain function for helix(3) is still chain(14),even after this update.When a relationship key function is updated, such as the structure chain func-tion, the entire key of the argument instance changes, and therefore we must replacesubsequences of key values rather than single key elements when propagating this typeof update. Consider, for example, the following (rather unlikely) update:updatefnval(structure chain, [helix(3)], chain(14), chain(16)).If the key of chain(16) is ['p2cga', 'B'] then the key of helix(3) is transformedfrom ['p1crn', 'A', 'h1'] to ['p2cga', 'B', 'h1']and this change must also be propagated to all instances that are key dependent onhelix(3).

Chapter 3Integrity Constraints in P/FDMAn integrity constraint is a description of some condition that must be satis�ed by adatabase state if it is to re ect its real world semantics accurately. Integrity constraintsdi�er from the more general notion of constraints in AI programming [74] in that theyrefer to the consistency of a set of data rather than to the solution of a particularproblem. In addition to this, integrity constraints must always be satis�ed, whereas inAI constraints may sometimes be relaxed (i.e. weakened) or ignored if the solution ofthe problem in hand requires it.Three types of integrity constraint can be distinguished: structural, semanticand transitional. Structural constraints describe restrictions on the \form" of data, andinclude such properties as key-uniqueness, referential integrity and totality of attributes.Semantic constraints, on the other hand, refer to the \meaning" of the data in the realworld. Examples of this kind of constraint are:Students must be aged between 16 and 65(8s) student(s) ^ age(s, a) ) a � 16 ^ a � 65Research students are always supervised by the manager of the project onwhich they work(8rs) research student(rs) ^ supervisor(rs, sp) ^ project(rs, p) ^manager(p, m) ) sp = m 48

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 49Transitional constraints specify allowable state changes, such as those de�ned by theconstraint that a student's �nal examination grading may be increased by at most onelevel and can never be decreased.The boundaries between these categories of constraint are not clear-cut. Manystructural constraints can be expressed as semantic constraints, given a su�ciently exible constraint language. Key-uniqueness constraints, for example, can be expressedsemantically as:(8o1,o2,k1,k2) class(o1) ^ class(o2) ^ key(o1, k1) ^ key(o2, k2) ^k1 = k2 ) o1 = o2However, because of their generality and broad applicability, structural constraints o�ermuch greater scope for e�cient implementation when they are embedded within thedata model and enforced by special-purpose, low-level code that can take advantageof knowledge of the storage schema. Another way to categorise integrity constraints,therefore, is as inherent (or schema) constraints [12], which are implemented within thedata model and explicit constraints, which are described in a special-purpose languageand enforced by some special constraint maintenance subsystem.It is possible to de�ne semantic constraints as transition constraints [14] (i.e. anupdate is only allowed if it preserves the constraint) but in general this results in alower level speci�cation of the original constraint (since it must contain details of theupdates which may violate the constraint | this information is implicit in a semanticspeci�cation). Transition constraints are perhaps the least generally useful form ofconstraint since they require a domain with a signi�cant temporal dimension to be usedto their full potential. Our application domain, protein structure, lacks any temporalelement, and we have therefore con�ned our attention to semantic integrity constraints.The remainder of this chapter is organised as follows. We begin, in Section 3.1by summarising the approaches that have been taken to providing e�cient constraintmaintenance in other systems, and then, in Section 3.2, describe the approach we havetaken in P/FDM. Section 3.3 describes a proposed constraint language extension toDaplex and shows how a subset of this language can be compiled into Prolog code forrun-time constraint checking. Finally, in Section 3.4, we show how this language can beused to describe the semantics of the protein structure domain.

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 503.1 Approaches to Constraint MaintenanceThe main di�culty with implementing a constraint maintenance subsystem is that ofminimising the run-time overhead of having to check that constraints are satis�ed.One of the most signi�cant contributions to this problem, which has since been usedas the basis for almost all implementations of integrity constraint subsystems, is thesimpli�cation method proposed by Nicolas [83]. Nicolas realised that, if one can assumethat the database state satis�es all the integrity constraints immediately prior to anupdate operation, then it is only necessary to check those constraints which could bea�ected by that update. So, the constraint that all students are older than 16, forexample:(8s) student(s) ^ age(s, a) ) a>16cannot be violated by the addition of a new course to the database, and so need not bechecked in this case1 Neither is it possible for any deletion, even of students or ages, toviolate this constraint, and it need therefore be checked only in a very restricted numberof cases.Nicolas also showed that it is su�cient to check a simpli�ed form of the fullconstraint, that tests for violation only within the subset of the data that is a�ectedby the update, and, moreover, that this simpli�ed form could be generated at compile-time by considering each of the updates that might violate the constraint in turn. Forexample, we know that the creation of a new student, S, with age A could violatethe above constraint, and we can use this information to partially evaluate the aboveconstraint, so that:student(S) ^ age(S, A) ) A > 16becomesTrue ^ True ) A > 16� A > 161This property relies on the speci�cation of the constraint being range-restricted [14]. However,all practical constraint languages exhibit this property and this is not a signi�cant restriction onexpressibility.

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 51In other words, the update satis�es the full constraint if the new age value, A, satis�esthe simpli�ed constraint check. This technique has the advantage of being conceptuallysimple and is independent of any particular paradigm or architecture. It was originallyproposed for relational databases, with constraints expressed in �rst order logic, but ithas also been successfully applied to logic and deductive databases [40, 92] and semanticdata model databases [105, 56, 33, 5].The object-oriented model [111], when it emerged, appeared to be particularlysuitable, architecturally, for the implementation of constraint checking by simpli�cationtechniques. The simpli�cation method works best when the system has a very clearidea of when updates have occurred and exactly what form they take. Object-orientedsystems have this knowledge, due to the use of message passing for control and theencapsulation of objects by methods. In this context, then, updates occur when anupdate message is sent to an object, and all parameters to the update can be found byinterrogating the message structure itself.The object-oriented version of constraint simpli�cation, then, behaves as follows:1. the constraint description is analysed (at data de�nition time) to seewhich update methods might violate it2. for each such method, the constraint is simpli�ed and compiled into afragment of procedural code3. the procedural representation of the simpli�ed check is added to thebody of the appropriate method as a \guard" to the update (in somesystems this may require recompilation of the method de�nition).For example, the constraint that no person may be older than 200 can be violated bythe method which adds a new age value, Age, to a person instance, so under the schemedescribed above this method's de�nition would be pre�xed by code that checks that Age� 200:if Age > 200 thenexit;original body of method

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 52This approach to constraint checking is potentially very e�cient, since constraints arerepresented as compiled procedural code | i.e. no run-time interpretation is required |which is executed as part of the standard method invocation procedure { i.e. no specialcontrol mechanism is required to invoke the constraint code. Unfortunately, it also hasseveral serious disadvantages:1. Constraints may be overridden. Under the typical inheritance semantics in object-oriented systems, methods de�ned at one level of a class hierarchy can be over-ridden by more specialised methods at a lower level. While this is a very usefulfeature for implementing object behaviours, it can mean that methods containingautomatically created constraint checks can be overridden by methods which donot contain these checks. If a more specialised method does not include a call tothe higher-level method, the constraint check will not be invoked for updates toinstances which inherit the newer method and integrity may be violated.2. The procedural descriptions of constraints are distributed throughout the system,with several fragments of code being embedded in separate methods for eachconstraint. This makes it very awkward for an application developer to be ableto modify constraints or remove them, since it is not clear where their proceduralrepresentation is stored. This is particularly true when an update method has beenmodi�ed to preserve several integrity constraints, since the guard for the particularconstraint being modi�ed may be embedded within a sequence of guards for otherconstraints.3. Although the compilation of declarative constraint descriptions into proceduralcode results in e�cient run-time checking, it also means that the DBMS has noinformation about the constraints that it is currently enforcing. Without thisinformation the DBMS can give only limited support to the user when constraintsare violated, whereas the availability of the declarative constraint details opens upmany interesting possibilities for providing more intelligent support for updates indatabases (e.g. semantic query optimisation [67], automatic generation of updatesto restore validity on constraint violation [78, 107]).

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 53These problems are caused by the too close association of constraints and meth-ods in the object-oriented paradigm, both of which in reality have quite di�erent seman-tics. By embedding constraint code within methods, we are forcing constraints to havethe same inheritance semantics as methods, when in fact a non-overriding semanticsis more appropriate. We are also removing the possibility of de�ning special-purposeoperations on constraints (such as disable constraint or delete constraint) thatare not applicable to methods or to blocks of constraint code.One solution is to use active rules [7] to implement constraint checking. An activerule is a kind of structured trigger, which is described by three components: an event,a condition and an action. When the event occurs, the condition is evaluated, and if itevaluates to True then the action is executed. Active rules (or ECA-rules as they arealso known) are a very general mechanism, and can be used to implement many otherdatabase features, such as security rules, timed events and situation monitoring as wellas integrity constraints.Several systems (e.g. ALICE [105], Starburst [21], Abel [33], Constraint Equa-tions [78] and ODE [56]) have taken an active rule approach to constraint enforcement,which involves compiling a declarative constraint speci�cation into one or more activerules of the form:event = update that may violate the constraintcondition = is the constraint violated?action = prevent update from occurring or restore validityUnder this approach, the rule triggering mechanism decides which constraints are to bechecked for which method invocations, and it is therefore able to implement whateversemantics are appropriate for the inheritance of constraints, regardless of the semanticsof method inheritance. Also, since the procedural representations of the constraint arenow stored independently of method de�nitions, they can be manipulated much moreeasily. Individual constraints can be added, deleted and disabled simply by adding,deleting or disabling the appropriate sets of constraint rules. This operation can be madeeven easier if the declarative form of the constraint is retained in the metadata as a �rst-class object, and is linked to the rules which make up its procedural representation. Inthis way, the user views and manipulates a constraint as a single high-level, declarative

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 54representation, while the DBMS takes the responsibility for managing the set of ruleswhich make up its procedural representation [33].Systems di�er, of course, as to the exact form of the rule generated, and also inthe amount of simpli�cation that can be performed, which depends on the exibilityof the underlying architectures. In the Starburst system [21], for example, a singlerule is generated for each constraint, that is triggered by any of the set of events thatmay violate its integrity. The condition of this rule evaluates the set of all tuples notsatisfying the constraint and succeeds if it is non-empty. Since the rule may have been�red by any one of the triggering events, it is not possible to use the details of the actualupdate to simplify the constraint condition at compile-time.The authors of the ODE system [56] have similar di�culties, since their con-straints take the form of situation-action rules, with the triggering events being implic-itly speci�ed as the invocation of any private member function belonging to the classon which the trigger is de�ned. Some limited simpli�cation is possible for constraintswhich are quanti�ed over more than one class (called inter-object constraints by theauthors) [56] but the complete lack of any speci�c knowledge of the triggering methodmeans that much redundant work must still be done to check the constraint. This is aneven more serious disadvantage in view of the fact that, in ODE, it is not possible todistinguish updating methods from non-updating methods, and constraints are checkedeven after methods which have made no state changes.This inability to gain the maximum e�ciency from simpli�cation is due to the un-derlying architectures of these systems, however, and it is not an inherent disadvantageof the active-rule approach to constraint maintenance. Both Abel [33] and ALICE [105],for example, generate separate rules for each update that might violate a constraint,and are thus able to produce highly speci�c rule conditions.The systems which take an active-rule approach to constraint maintenance alsodi�er in regard to the actions that are generated to cope with violation of a particularconstraint. Two default actions are possible, depending on whether constraints arechecked before the update occurs (e.g. as in the Abel system), or afterwards (e.g. as inALICE and Starburst). In the former case, the default action is to forbid the invalidupdate from going ahead, while in the latter case the default action is to roll back.Some systems o�er scope for more intelligent handling of violations. In Starburst, for

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 55example, users are asked to provide an appropriate action for rules when constraintsare compiled, whereas in the ALICE system repair actions are generated automaticallyby the DBMS from the constraint description. Constraint Equations [78] o�er a similarbut more limited facility, with the constraint speci�cation containing \clues" to themost suitable repair actions. These techniques of automatic integrity restoration arestill very much in their infancy, however, and further research is required in the area.See Section 6.2.2 for further discussions of this issue.Another important feature of some of these systems is the continued storage ofthe declarative description of each constraint after compilation, so that the constraintdescriptions are available as an additional information resource about the applicationdomain for both the user and the DBMS. In the Abel system, for example, constraintsare stored as �rst class objects, with identity and the ability to participate in relation-ships with other objects. Each constraint object stores its declarative description as oneof its attributes and is also linked to the rules which make up its procedural descriptionvia relationships. In this way, both the declarative and the procedural representations ofthe constraint are stored and manipulated separately from methods. Constraint inheri-tance is now handled by the event handling component of the DBMS, and can thereforeadopt the required semantics regardless of the semantics of method inheritance. Andsince the links between the constraint object and the rules which enforce it are retained,it is possible to selectively disable, modify or delete constraints without requiring acomplicated analysis and/or recompilation of method de�nitions.Unfortunately, this approach to constraint maintenance imposes a greater over-head on the DBMS than the method generating approach. For example, in [34], theauthors report that the introduction of an active rule mechanism into the ADAMOODBdoubles the run-time overhead of the message send operator. In addition to this, therecan also be problems with \anomalous rule behaviour" [107], in which updates in theaction part of a constraint rule can trigger a cascade of further constraint violations andrepair updates, possibly causing the system to go into an in�nite loop.The architecture implemented as part of this thesis suggests a compromise be-tween the e�cient but in exible method generating approach to constraint maintenanceand the exible but less e�cient active rule approach. This architecture was originallydesigned for and implemented in the ADAM OODB, along with a functional style con-

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 56straint language called CoLan [5]. ADAM [86] is an extensible DBMS that uses meta-classes to allow new database features to be prototyped easily. The CoLan architecturedoes not require metaclasses, however, as we have been able to show by re-implementingit in P/FDM [37].The approach we take to constraint maintenance is essentially very simple. Atcompile-time, we take a high-level declarative speci�cation of a constraint and generateseveral fragments of code that will check for violation of that constraint by variousupdates. Both the declarative text of the constraint and the code fragments are storedin the metadata in such a way that the link between the constraint and its associatedcode is maintained, and that facilitates run-time retrieval of individual code fragments.The run-time behaviour is speci�ed as a guard to each database update operation.This guard uses the metadata and the details of the current update to decide whichconstraints (including inherited constraints) may be violated by the update, and whichcode fragments must consequently be executed. These fragments are extracted from themetadata and executed in the context of the current update. If the update passes allthe checks then it is allowed to succeed.The important point here is that the guard (i.e. the constraint checker) decideswhich code fragments must be executed independently of any other system bindingmechanism (such as method binding), at run-time. This means that constraints may beadded to or deleted from the schema without requiring any recompilation of methods,and also that we can specialise the inheritance strategy for constraints according to thesemantics we require. Our approach gains us these advantages without incurring theoverhead of a general-purpose rule manager.3.2 The P/FDM Constraint Maintenance SubsystemUnlike the other extensions to P/FDM that are described in this thesis, the implemen-tation of the constraint maintenance subsystem involves not the de�nition of a newmodule type, but the introduction of a new data model element (i.e. the constraint)that is applicable across all module types. In P/FDM, data model elements are de�nedstructurally by the form of their metadata descriptors and behaviourally by the primi-tive operations that may be performed on them. Entity classes, for example, are de�ned

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 57by the entity descriptors and the entity manipulation primitives (such as getentity/2and newentity/3). The implementation of integrity constraints in P/FDM, therefore,consists of two parts: the de�nition of new metadata structures that allow the storageof constraint information in a way that facilitates e�cient run-time access, and the def-inition of several new primitives that provide facilities for managing and manipulatingof that information.3.2.1 Metadata Structures For ConstraintsIn the ADAM implementation of CoLan [5], constraints are represented as instances ofa special constraint class, with slots storing the declarative text of the constraint andthe links to the classes it involves. In P/FDM, on the other hand, we represent metalevelconstructs such as constraints by metadata descriptors, as described in Section 2.3. Theform of the constraint descriptor is:cdesc(ConstraintId, QuantType, Enabled, Counter, Daplex,ICode, Module).whereConstraintId is a unique identi�er assigned to the constraint when it was �rst created.This identi�er consists of the name of the module in which the constraint wasde�ned, combined with an identi�er su�x (e.g. pdb1 3, unidb 16). Since themodule name forms part of the constraint identi�er, it is only necessary that theidentifying integers be unique within the module in which they are de�ned. Wecan ensure this by extending the module descriptor clauses with an extra argumentstoring the last integer allocated to a constraint within that module:mdesc(MName, ModuleType, Status, LastFId, LastAId, LastCId).QuantType takes either the value universal or the value numerical, depending uponthe type of the constraint's quanti�erEnabled is a ag indicating whether the constraint is enabled or disabled (notice thatsince the constraint descriptor persists between sessions so does the value of theEnabled ag)

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 58Counter is applicable only to numerically quanti�ed constraints, and contains a countof the number of instances that currently satisfy the constraintDaplex is the declarative speci�cation of the constraint in Daplex, stored as a textstringICode is a semi-compiled version of the constraint, expressed in the standard interme-diate code format that is generated by the Daplex parser, andModule is the name of the module in which the constraint is de�ned.The description of the constraint is stored twice in the descriptor, both as Daplextext and intermediate code, because each format is suitable for a di�erent use. TheDaplex text string is more readily understood by human users and the intermediatecode structure is easier for the DBMS to manipulate. When a constraint has beenviolated, for example, we can extract the Daplex text from the constraint's descriptorand incorporate it into the error message that is sent to the user. When, on the otherhand, the DBMS is searching for constraints that involve a particular function (perhapsfor use in semantic query optimisation) we can use Prolog's pattern matching facilitiesto search the intermediate code representation, without having to rebuild the parsetree from the Daplex version. Another advantage to retaining the intermediate codefor each constraint is that it is then very easy to incorporate the constraint into otherDaplex programs (which are also converted to intermediate code during the compilationprocess). And, since intermediate code is also the input for the Daplex query optimiser,we already have a ready made optimiser for our constraint code.Another metadata term provided for constraints records the link between eachconstraint and its initialisation routine:constraint init code(ConstraintId, InitRoutineHead).The initialisation routine is a Prolog fragment that is used to check validity of theentire database relative to a particular constraint, before that constraint is added to thedatabase. For numerically quanti�ed constraints, the routine also returns the number ofinstances currently satisfying the constraint, so that the Counter �eld of the constraintdescriptor may be set correctly. We retain the link to this piece of code so that we canperform the same initialisation process when re-enabling a disabled constraint.

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 59Finally, we must provide an index into the constraint metadata that will allowus to implement our run-time checking behaviour e�ciently. The index must allow usto retrieve the set of code fragments that check a particular update, given the type ofthat update and the class or function on which it is to occur. This index is provided bythree relations, stored as metadata clauses. Two of these relations are called constraintindexes, because they provide an associative index to constraint identi�ers, one fromclasses and the other from functions (i.e. attributes):class to constraint(ClassName, ConstraintId).function to constraint(FunctionName, FirstArgType, ConstraintId).We do not index inherited constraints in the constraint index terms because we wish tohide the details of the actual inheritance process within the runtime checking procedure.If we later �nd that the performance of the constraint checking is not adequate, we canconsider sacri�cing the exibility this gives us and embedding the inheritance strategyinto these indexes.We must also provide a link to the Prolog code fragments that implement theconstraint checks. As with the ADAM implementation, in P/FDM we generate frag-ments of code that check that consistency is maintained for each of the basic updateoperations: namely, creating a new instance, deleting an instance, adding a new functionvalue, deleting a function value and updating (i.e. changing) a function value. Again,since we cannot attach these directly to the class and function metadata, as we can inADAM, we must represent them as a set of related clauses. This is the third metadatarelation, which we call the fragment index:constraint to code(ConstraintId, Class/FunctionId, UpdateType,ArgumentsToUpdate, HeadOfFragment).This relation allows us to retrieve the code head for the fragment that checks the givenupdate for the given class or function. We use this two-stage lookup process (i.e. meta-data identi�er to constraint, constraint and metadata identi�er to code fragment) be-cause this simpli�es the handling of inherited constraints.

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 60The uniform interface to metadata provided by P/FDM gives both the user andthe system programmer very exible access to information about the current set ofconstraints. We might, for instance, want to �nd out which constraints are currentlydisabled:for each c in constraintmeta such that enabled(c) = ``no''print(text(c));or list all constraints which involve the \student" class:for the e in entitymeta such that oname(e) = ``student''for each c in constraints on(e)print(text(c));The full set of metadata functions relevant to constraints are given in the metadataschema in Appendix C. We expect this metadata querying facility to be particularlyuseful for applications which make use of constraint information other than for integritymaintenance.3.2.2 The Constraint Manipulation PrimitivesHaving de�ned the metadata structures for constraints, we must also de�ne the primitiveoperations that may be performed upon them. It must be possible to add and deleteconstraints freely, and to also disable and re-enable them. And, of course, we mustbe able to check the validity of the constraints relative to a particular update. Thisbehaviour requires the de�nition of �ve new primitives:new constraint(+ConstraintText, +Module) which adds the given con-straint to the schema of the given module, having �rst executed theinitialisation code to check that it is currently satis�ed by the database.delete constraint(+ConstraintDescriptor) which deletes the given con-straint from the schema in which it is de�ned. In fact, this primitivealso removes all references to the constraint, from its descriptor andassociated metadata clauses to the code fragments that have been gen-erated from it.

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 61disable constraint(+ConstraintDescriptor) which causes the speci�edconstraint to be ignored during constraint checking, by setting theEnabled ag in its metadata descriptor to the value disabled. All therest of the constraint's metadata remains unchanged, ready for lateruse.enable constraint(+ConstraintDescriptor) which causes the given dis-abled constraint to become visible to the constraint checking processonce more. Before re-enabling the constraint, we use the initialisationcode to check that the database still satis�es the constraint. If theconstraint has been violated, then it may not be re-enabled.check constraints(+EventDescriptor) which checks that the given eventwill not violate any of the currently enabled constraints. The argumentto this primitive is a term giving the details of the current update event(described below).These new primitives fall under the class of \data de�nition" primitives, and as such donot operate on the driver/internal de�nition principle but are implemented as a singlede�nition which is applicable to all module types. This is of particular relevance to thecheck constraints/1 primitive, for which it may not be possible to identify a singlemodule type (since a single event may a�ect constraints stored in several modules).In order to prevent updates that violate constraints from occurring, we placea call to check constraints as a guard within the driver primitive for each updateoperation. The driver primitive for newentity/3,for example, becomes:newentity(Class, Key, InstId) :-class name to module type(Class, MType, Desc),check constraints(event(newentity, Desc, [Key])),internal newentity(MType, Desc, Class, Key, InstId).If the update will not violate any constraints, then check constraints/1 succeeds andthe update proceeds; otherwise, check constraints/1 fails, causing the driver primitiveto fail and thus preventing the illegal update from occurring.The argument to the check constraints/1 primitive is an event descriptor.This is a term of the form:

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 62event(UpdateType, Descriptor, UpdateArgs).which contains all the information about the requested update that is necessary forconstraint checking. The arguments are:UpdateType, an atom describing the kind of update that was requested (e.g.deletentity, updatefnval)Descriptor, the metadata descriptor of the class or function on which theupdate is to occur, andUpdateArgs, a list of the remaining arguments to the update primitive,which give the details of the update that was requested.The call newentity(person, [walton, william], P), for example, will generate theevent:event(newentity, edesc(person, : : :), [[walton, william]])and the call addfnval(age, [person(3)], 50) will generate:event(addfnval, fdesc(age, [person], : : :), [[person(3)], 50])The basic algorithm followed when checking for constraint violations is:1. Using the constraint index terms, generate a set of the identi�ers of allconstraints (including inherited constraints) that might be violated bythe given event.2. Using this set, and the fragment index terms, extract the heads of thecode fragments that must be executed in order to check the validity ofthe constraints.3. Combine the code heads into a single Prolog goal (taking the formof a negated disjunction) and execute it, the success of the constraintchecking process being equivalent to the success of this goal.The code heads are combined into a negated disjunction for e�ciency. We generatecode fragments that succeed when the constraint is violated, on the grounds that it is

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 63potentially more e�cient to search for a single set of values that violate a constraint thanto test that all combinations satisfy it. We check that all the constraints are satis�ed,therefore, by checking that �A ^ �B ^ �C succeeds (where A, B and C represent the codefragments to be executed) or, in other words, that A _B _ C fails.The constraint checking behaviour is kept relatively straightforward by the simpledevice of doing as much of the work as possible in advance, at compile-time. The onlyexception is the identi�cation of inherited constraints, which is done solely at run-time.The reason for this is twofold: �rst, if we were to handle inherited constraints at compile-time and to include them in the constraint indexes, then the details of the inheritanceprocess would be expressed implicitly by the metadata, rather than explicitly by thecheck constraints/1 primitive as now. Any changes to the inheritance semanticswould require rebuilding of all constraint indexes, whereas by keeping the workingsof the inheritance process separate from the indexes, we are free to experiment withdi�erent inheritance semantics. Secondly, we would be embedding the details of the classhierarchies in the schema into the constraint indexes, which would require rebuildingshould the form of the schema change.We will illustrate the handling of inherited constraints with an example. Considerthe schema in Figure 3.1 and suppose that we have de�ned the following constraints:c1: every person is aged between 0 and 200 yearsc2: every student is older than 16c3: every postgraduate student is older than 19c4: there are no more than 5 professors on the sta�From these constraints we build a set of constraint index terms:function to constraint(age, [person], c1).function to constraint(age, [student], c2).function to constraint(age, [postgrad], c3).function to constraint(position, [staff], c4).where c1, c2, etc. represent the internal constraint identi�ers generated by the systemfor each of the above constraints2. In response to a request to de�ne the age of a2N.B. in this case, no class to constraint/2 terms are generated since no operations on classes canviolate these constraints. This issue is discussed further in Section 3.3.

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 64personstudent sta�ugrad postgradage position

Figure 3.1: Schema illustrating the handling of inherited constraintsstudent instance, e.g.addfnval(age, [student(4)], 18).we must obviously check that constraint c2, which is de�ned on the student class, isnot violated by the update; but we must also check constraint c1, which we inheritfrom the person class. However, it is possible that student(4) is also a member ofthe postgrad class, in which case we must also check constraint c3. In other words,we must search both upwards and downwards in the hierarchy in order to �nd all theconstraints that must be checked. This is di�erent from the inheritance behaviour ofother systems such as CoLan, which support most specialised binding of identi�ers andtherefore always begin the search at the lowest populated level of the hierarchy andneed search only upwards.Another complication caused by the lack of most-specialised binding in P/FDMis that we have to ensure that the arguments we pass to the constraint checking frag-ments are of the correct type. The fragment that checks addfnval-type updates againstconstraint c2 expects a student argument, so the argument to the addfnval/3 primi-tive can be passed through unchanged. But the fragments generated for constraints c1and c3 expect a person instance and a postgrad instance respectively as their argu-ments, and so the student instance must be cast to these types (using relative/3)before the fragments can be executed.

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 65This discussion highlights another advantage of having a separate inheritancemechanism for constraints and for methods. One of the biggest problems of not sup-porting most-specialised binding is that constraints de�ned at a lower level in the hi-erarchy will be ignored when updates are made at a higher level, and consequentlyinvalid data may be introduced into the database. However, with separate inheritancemechanisms we can implement the semantics of most-specialised binding for constraints,without having to solve the problems caused by most-specialised binding for methodson overlapping subclasses. The most-specialised semantics prove no problem when usedin conjunction with the inclusion inheritance of constraints: where there are overlap-ping subclasses, we can simply \inherit" the constraints on all the classes to which theupdated instance belongs.3.3 The Constraint Language Extension to DaplexThe original CoLan language was designed to take a functional style based on Daplex.Function composition provides a natural way to represent chains of relationships formingsets or values to be constrained, and the functional paradigm allows a natural represen-tation of expressions involving arithmetic and aggregate and set-based operators. Notsurprisingly, the P/FDM constraint language is also based on Daplex but the existenceof a full compiler for that language means that we have been able to reuse even more ofthe existing syntactic structures of Daplex. This also means that our constraint language�ts naturally into the existing data de�nition language and should feel comfortable tousers who are already familiar with Daplex.In our language, as in CoLan, constraint speci�cations consist of two parts: aquanti�cation part and a predicate part. The Daplex syntax already contains constructsfor describing both quanti�ed expressions and a full range of predicates, so the only newsyntactic constructs required are those given below (in BNF):<constraint dec> ::= constrain <quant part> <so that> <predicate> ;<quant part> ::= <quantifier> <named set expression><quant rest><quant rest> ::= <quant part> | []<so that> ::= so that | to have

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 66Here, <quantifier>, <named set expression> and <predicate> are de�ned by theexisting Daplex language (the full syntax of which is given in Appendix B). We cannow express constraint c1 given earlier as:constrain each p in personso that age(p) > 0 and age(p) < 200;The language as speci�ed by the syntactic extension to Daplex given above allows theexpression of a very rich set of constraints. In the current version of P/FDM, however,we have implemented a subset of the full constraint language, which allows only asingle quanti�er and simple predicates expressing a conjunction of comparisons betweensingle-values. It is hoped that future versions of the language will be able to use thisimplementation as a starting point for extending this subset, in particular to supportnested quanti�ers, set-based comparisons and aggregate functions. In the discussionwhich follows, all references to the \constraint language" indicate the subset of the fulllanguage that has actually been implemented.Six di�erent quanti�ers are supported by the constraint language: the threestandard quanti�ersfor all: denoted by the keywords each and all, as in \all students are olderthan 16"constrain each s in studentso that age(s) > 16;there exists: denoted by some and any, e.g. \some sta� member is a lecturer"constrain some s in staffso that position(s) = ``lecturer'';not exists: denoted by no, e.g. \no postgraduate student is younger than20" constrain no p in postgradto have age(p) < 20;

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 67and three numerical (or cardinality) quanti�ers [22]exists at least: denoted by at least <n>, e.g. \there are at least 2 secre-taries on the sta�"constrain at least 2 s in staffso that position(s) = ``secretary'';exists at most: denoted by at most <n>, e.g. \the Computing Science De-partment can support at most 100 undergraduate students"constrain at most 100 u in undergradto have name(dept(u)) = ``computing science'';exists exactly: denoted by exactly <n>, e.g. \the university has exactly oneprincipal"constrain exactly 1 s in staffso that position(s) = ``principal'';In terms of implementation, the quanti�ers can be grouped into two categories:universal and numerical. The universal quanti�ers, for all and not exists, are imple-mented by generating fragments that check that a particular update does not introducean instance that violates the constraint. For constraints quanti�ed by for all this meanschecking that any new instance introduced has the property described by the predicatepart. For not exists constraints, it means checking that newly introduced instances donot satisfy the constraint's condition.The second category of quanti�ers contains all the numerical quanti�ers and thethere exists quanti�er. Constraints quanti�ed in this way are implemented by storing acounter containing the number of instances that currently satisfy the constraint, withinits metadata descriptor. Updates are examined to see how they will a�ect the valueof the counter, and if they will take the counter out of its allowed range then they aredisallowed. Otherwise, the counter is increased or decreased as appropriate, and theupdate proceeds.The there exists quanti�er is included within this subgroup as it is more e�cientto implement it as the equivalent exists at least 1 quanti�er. It is easy enough tocheck whether an update will delete an instance satisfying the existentially quanti�ed

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 68DaplexText LexicalAnalyser Parser Optimiser CodeGenerator PrologClauselist oftokens ICode ICodeparse trees for actionsFigure 3.2: Functional components of the Daplex compilercondition, but it is less easy to make an e�cient check that it will delete the only suchinstance. By implementing the quanti�er as its numerical equivalent, we can make useof the constraint counter to tell us which updates will actually violate the constraint.3.3.1 The Internal Format of the Constraint LanguageFigure 3.2 shows the main components of the Daplex compiler and the transformationsthat each program undergoes in its translation to Prolog. Programs are �rst translatedinto an intermediate format, called Intermediate Code or ICode. The ICode is opti-mised to produce an equivalent but more e�cient ICode description of the program,which is �nally translated into a piece of Prolog code with embedded calls to the datamanipulation primitives.The format of the ICode is based on the set notation called ZF-expressions [90],which describes a set as a collection of generators for variables and restrictions on theirvalues. For example, the set of sta� members working for the Russian Department canbe described by the ZF-expression:[ s | s staff; d dept(s); name(d) = ``russian'' ]which consists of a pattern describing the elements of the result set (in this case, thevariable s), two generators (for the variables s and d) and a restriction. Similarly, theP/FDM ICode format consists of generators and restrictions, so that, for example, theDaplex set:s in staff such that name(dept(s)) = ``russian''has the following ICode representation:

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 69[ generate(staff, var(uevar1)),generate(dept, [staff], [var(uevar1)], dept, var(evar2)),restrict(name, [dept], [var(evar2)], var(evar3)),expression(=, var(evar3), string(russian))]Constraints can be represented by an existing ICode construct - the restrict subquery.In standard Daplex, the restriction subquery represents a quanti�ed set expression, asin the following expression, which describes the set of people who have at least onechild: p in person such that some p2 in person has parent(p2) = pThe ICode representation of this set expression is:[ generate(person, var(uevar1)),restrict subquery(some(var(uevar2)), [generate(person, var(uevar2)),generate(parent, [person], [var(uevar2)], person, var(evar3)),expression(=, var(evar3), var(uevar1))])]Restriction subqueries have the general form:restrict subquery(Quantifier, SetICode)where Quantifier is a term describing the type of quanti�cation and SetICode is a listof ICode constructs describing the set to be quanti�ed over. This structure is equallysuitable for the representation of constraints, so that, for example, constraint c2 can berepresented by:restrict subquery(all(var(evar1)), [generate(student, var(evar1)),restrict(age, [student], [var(evar1)], var(evar2)),expression(>, var(evar2), 16)])

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 703.3.2 Generating Initialisation Code for ConstraintsThe use of an existing intermediate code construct (i.e. the restrict subquery) torepresent constraints means that we can reuse much of the existing code generator inorder to produce the �nal Prolog code fragments. This is particularly true when gen-erating the initialisation code that checks the entire database for violation of particularconstraints, as we shall now describe.Initialisation code is generated slightly di�erently for each of the two classes ofconstraints. For the universally-quanti�ed constraints, we generate a fragment of codethat looks for a single instance that violates the constraint, and then use the negationof this fragment as the body of the initialisation routine. For example, for constraintc2, which is quanti�ed by for all, we �rst negate the SetCode, given in the ICoderepresentation above, to give:generate(student, var(evar1)),restrict(age, [student], [var(evar1)], var(evar2)),expression(=<, var(evar2), 16)and use the existing code generator to produce a Prolog fragment that succeeds whena student that is younger than the allowed age limit is found:getentity(student, S),getfnval(age, [S], A),A =< 16The body of the �nal initialisation routine, then, succeeds when this fragment fails to�nd such a student:c2 init(0) :-n+ ( getentity(student, S),getfnval(age, [S], A),A =< 16).This procedure is the same for constraints quanti�ed by not exists except that, inthis case, we omit the initial negation of the intermediate code. The ICode for this typeof constraint already expresses the property satis�ed by invalid instances and does notrequire any modi�cation. Here, for example, is the ICode representation of constraintc3, and the initialisation routine generated from it:

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 71restrict subquery(no(var(uevar1)), [generate(postgrad, var(uevar1)),restrict(age, [person], [var(uevar1)], var(evar2)),expression(<, var(evar2), 20).c3 init(0) :-n+ ( getentity(postgrad, P),getfnval(age, [P], A),A < 20).When checking numerically-quanti�ed constraints, on the other hand, we areinterested in the number of times the predicate of the constraint is satis�ed, ratherthan whether a single violation exists, so our initialisation code must �rst count thenumber of instances that satisfy the constraint and then compare this total against theallowed limit. (Recall that we also require this routine to return this value so that itmay be used to initialise the counter in the constraint descriptor.) The basic form ofthe initialisation routine for a numerically-quanti�ed constraint is:<constraint id> init(NumInsts) :-findall(X, Body, Xs),count(Xs, NumInsts),NumInsts Operator Limit.whereBody is a Prolog fragment, generated directly from the unmodi�ed ICode,that �nds instances satisfying the constraint's conditionOperator is a comparison operator that implements the appropriate test onthe number of instances given the quanti�er of the constraint that isbeing checked (i.e. \�" for at most, \=" for exactly and \�" for atleast), andLimit is the allowed limit for the constraint counter as given in the restric-tion subquery term.

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 72For example, the initialisation for constraint c4, whose ICode representation is:restrict subquery(most(var(evar1), 5), [generate(staff, var(evar1)),generate(position, [staff], [var(evar1)], string, var(evar2)),expression(=, var(evar2), professor)])would be generated as follows:c4 init(NumInsts) :-findall(X, (getentity(staff, X),getfnval(position, [X], P),P = professor), Xs),count(Xs, NumInsts),NumInsts =< 5.3.3.3 Generation of Individual Code Fragments for ConstraintsIn addition to generating Prolog routines to check the entire database for consistency, wemust also generate fragments of code that check the constraint against each update thatmight possibly a�ect its validity. There are two basic steps to the generation process:�rst, we must identify the update events for which fragments must be generated; and,secondly, for each of these events we must simplify the ICode representation of theconstraint so that it expresses a check against that particular update event. Eachsimpli�ed piece of ICode is passed to the existing code generator for translation into aProlog fragment, which is then given a unique functor and stored in the metadata. Wewill now describe these two steps in detailGeneration of the Set of Events Requiring FragmentsIn the current implementation, which deals only with constraints with a single quanti-�er, we have been able to adopt a very simple approach to the generation of potentially-violating events that generates a near-minimal set of events. As before, the two cate-gories of constraints are dealt with separately.

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 73(i) Generating events for universally-quanti�ed constraints The only updateswhich can a�ect constraints of this kind are those which introduce new data into thedatabase, namely newentity, include, addfnval and updatefnval. Given a set of allclasses involved in the constraint (Cs) and a set of all functions involved in it (Fs), theset of events that must be checked (Es) is:Es = fnewentity; includeg � Cs [ faddfnval; updatefnvalg � Fs(where � represents the cross product operator.)In fact, we can reduce this set further for the class-based updates, due to our in-terpretation of universally-quanti�ed constraints as \weakly-translated" [105]. A weak-translation of a constraint is trivially satis�ed if any of the attributes that it involvesare unde�ned. Constraint c1, for example, requires only that all people whose age isknown be aged between 0 and 200. Under the alternative translation semantics, i.e. astrong-translation, the constraint would require that, for all person instances, the agefunction is de�ned and that its value falls within the required range, The di�erencebetween these two semantics is best illustrated by their equivalent �rst order logic ex-pressions (hence the terms weak- and strong-translation). A weak-translation of c1 toFOL gives: (8p; a) person(p) ^ age(p; a) ) a > 0 ^ a < 200while the strong-translation gives:(8p) person(p) ) (8a) age(p; a) ^ a > 0 ^ a < 200In P/FDM, we adopt a weak-translation since it is the most exible and the mostpractical of the two; exible because it allows us to constrain both total and partialfunctions without changing their semantics (i.e. the partial functions do not becometotal functions when constrained), and practical because it allows us to constrain classesand functions which are only partially de�ned. For example, under a strong-translationsemantics, a constraint such as c1 could only be de�ned when the constrained attributes

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 74of all the instances were de�ned, while in real world situations it often occurs that wedo not have all the data that we require at data de�nition time.Under our weak-translation semantics, then, we need only check updates forwhich there is a possibility that all the functions involved in the constraint will bede�ned. This will not be true in the case of a newentity/3 update, which creates a\bare" instance for which all non-key attributes are unde�ned. Constraint c1 illustratesthis: since age/1 is not a key function, the creation of a new person instance produces aninstance for which the age is unde�ned. And if the age is unde�ned then the constraintis trivially satis�ed. The only constraints, then, for which newentity events have to bechecked are those which involve only the key functions of the class being updated. Ifthe key of the person class is the sequence sname/1, cname/1, then the constraint:constraint no p in personto have cname(p) = `Èrnest'';may be violated by a call to the newentity/3 primitive (e.g. newentity(person,[worthing, ernest], P)), and this event must be incorporated into the results setEs. Note that this also applies to classes which are not mentioned explicitly in a con-straint, but which appear in function composition chains, e.g. the classes dept andlecturer in the constraint:constrain each c in courseso that faculty(organiser(c)) = faculty(teacher(c))expressed against the schema shown in Figure 3.3. Newentity events are generated forthis constraint in the following cases:event(newentity, dept)when both faculty(dept) and organiser inv(dept) are part of the keyof deptevent(newentity, course)when both organiser(course) and teacher(course) are key functions oncourseevent(newentity, lecturer)when both faculty(lecturer) and teacher inv(lecturer) are key func-tions on lecturer

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 75DEPT COURSE LECTURERorganiser teachernamefaculty title namefacultyFigure 3.3: Schema for the event generation exampleWe can place similar constraints on the generation of include events, since thistype of update can also create instances with unde�ned attributes. In this case, how-ever, the functions that will be de�ned are not just the key functions, but all functionsinherited from the existing hierarchy. Consider the sta�-student hierarchy given inFigure 3.1, on which constraints c1 to c4 were de�ned. When, for example, a personinstance with a known age, is extended to membership of the postgrad class, the result-ing postgrad instance will inherit the populated age attribute, and we must thereforecheck constraint c3, which involves this inherited attribute.For each class involved in a given constraint, then, we generate an include eventif and only if all the constrained attributes on that class are inherited attributes. Boththese restrictions, i.e. for newentity and include events, can signi�cantly reduce thenumber of events generated and therefore can also reduce the run-time overhead, byreducing the number of fragments that must be executed.(ii) Generating events for numerically-quanti�ed constraints For numerically-quanti�ed constraints, the situation is complicated by the need to maintain the con-straint counters. In order to do this we must generate fragments for all updates thatmay alter the constraint counter's value, regardless of whether that alteration may vi-olate the constraint or not. Since any addition may potentially increase the numberof times a constraint's condition is satis�ed and any deletion may potentially decreaseit, all events (subject to the same restrictions on newentity and include events, asdescribed above) must be generated. The only exceptions to this are constraints quan-ti�ed by the exactly quanti�er. Although it is a numerical quanti�er, we do not botherto maintain a constraint counter for constraints quanti�ed by exactly, because any al-teration to the number of times the predicate is satis�ed will cause the constraint to be

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 76violated (since we assume that the constraint is satis�ed prior to the update). In thiscase, all update events are potential constraint violators.Generation of Individual Code FragmentsHaving identi�ed the events that may potentially cause a violation of a particular con-straint, it is necessary to generate the code fragments that will identify a de�nite viola-tion. In order to generate code fragments to check speci�c updates against a constraint,we convert the ICode representation into a structure known as a constraint graph. Notto be confused with the constraint graphs of Urban and Desiderio [106], which aregroupings of several constraints indicating dependencies between them, the constraintgraphs of P/FDM are representations of the predicates of individual constraints. Theyare similar in form to the query graphs produced by the Daplex query optimiser whenconsidering di�erent reordering strategies for programs [59], and consist of a series ofnodes representing the classes or scalar types involved in a chain of composed functionapplications, linked by the functions which navigate between them. Figure 3.4 shows aconstraint (c5), its ICode representation and its constraint graph. For the simple classof constraints currently implemented constraint graphs take the form of two sub-graphsand a comparison operator. The Prolog representation of the graph given in Figure 3.4,then, is:cgraph([class(postgrad, var(uevar1)), function(research area),class(string, var(evar2))],expression(=, var(evar2), var(evar4)),[class(postgrad, var(uevar1)), function(supervisor),class(staff, var(evar3)), function(research area),class(string, var(evar4))])The two subgraphs are referred to as the left hand side graph (LHS) and the right handside graph (RHS) respectively. The update for which the fragment is to be generatedis itself converted into a subgraph, that is then compared with both the constraintsubgraphs, looking for matches. Each match represents a potential violation of theconstraint by the update, and results in the generation of a separate code fragment.

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 77so that research area(p) = research area(supervisor(p))constrain each p in postgrad(a) Daplex representationgenerate(supervisor, [postgrad], [var(uevar1)], staff, var(evar3)),restrict(research area, [postgrad], [var(uevar1)], var(evar2)),generate(postgrad, var(uevar1)),restrict subquery(all(var(uevar1)),restrict(research area, [staff], [var(evar3)], var(evar4)),expression(=, var(evar2), var(evar4))]) (b) Intermediate code representationP RAP = RAS S Presearch area supervisorresearch areapostgrad string string sta� postgrad(c) Constraint graph representationFigure 3.4: Three representations of constraint c5When a match has been found, the graph in which it occurs is split into twofurther subgraphs, known as the forward graph and the reverse graph. For example,the update event event(addfnval, research area, staff) is translated into the sub-graph:[class(staff, VarS), fn(research area), class(string, VarRAS)]and is then compared with the two subgraphs representing constraint c5. In this case,there is only one possible match, with the RHS graph, and this is split at the point ofthe match to give the forward and reverse graphs:Forward graph:[class(string, var(evar4))]Reverse graph:[class(postgrad, var(uevar1)), fn(supervisor),class(staff, var(evar3))]with VarS = var(evar3) and VarRAS = var(evar4).

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 78We now use our three subgraphs to generate the Prolog fragment. Beginning withthe two graphs involved in the match, we use the reverse graph to generate the codethat instantiates the \anchor" variable (var(uevar1)/P) by reversing it so that it beginswith the variable for which the update event has supplied a value (var(evar3)/S):[class(staff, var(evar3)), fn(supervisor inv),class(postgrad, var(uevar1))](Notice that relationship functions are converted to their inverses when constraintgraphs are reversed.) This graph can now be translated back into ICode format:ICode1 = [generate(supervisor inv, [staff], [var(evar3)],postgrad, var(uevar1))]The forward graph does not need to be reversed, as it already begins with a known vari-able (var(evar4)/RAS) and it can be converted to ICode without further modi�cation.In our example, the forward graph consists of only a single node, and therefore producesno ICode construct. It represents simply the fact that its variable is instantiated by thearguments to the update event.ICode2 = []The unmatched constraint graph, LHS, can also be converted unmodi�ed back to ICode:ICode3 = [restrict(research area, [postgrad], [var(uevar1)],var(evar2))]and �nally these three fragments of ICode can be glued together (in the order in whichthey were generated), combined with the comparison expression (negated for for allconstraints) and translated into Prolog using the existing code generator:getfnval(supervisor inv, [S], P),getfnval(research area, [P], RAP),RAP n == RAS.Figure 3.5 illustrates this process diagrammatically.

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 79Initial graphs P RAP = RAS S Pres area supres areaUpdate graph RAS Sres area(a) Match the update graphReverse graph P S (reversed to) S Psup sup invForward graph RASLHS graph P RAPres area(b) Split the graphs at the matching function linkFinal graph S P RAP = RASsup inv res area(c) Recombine the subgraphs to form the �nal graphFigure 3.5: Reformulation of example constraint graph for code generation

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 80For universally-quanti�ed constraints, the fragment of code generated by thisprocess is su�cient to express the constraint check, i.e. it will succeed if the constraintis violated and fail if it is not. It will also check exactly-quanti�ed constraints, sinceit will succeed if the update will cause a change to the value of the constraint counterand will fail otherwise. The �nal step in the process is to turn the code fragment intoa Prolog clause, with arguments containing the details of the update event. We alsoadd a call to a routine to display an error message when the constraint is violated. It iseasier to include this call here than to try to pass details of the violated constraint backto the check constraints/1 routine. The �nal clause generated to check constraint c5for this update event, then, is:c5 addfnval1([S], RAS) :-getfnval(supervisor inv, [S], P),getfnval(research area, [P], RAP),RAS n == RAP,constraint error(c5, addfnval(research area, [S], RAS)).The at least and at most quanti�ers require more careful handling. For thesenumerically-quanti�ed constraints, the Prolog fragment that is generated from the in-termediate code succeeds when the update will cause a change in the number of timesthe constraint's predicate is satis�ed. The constraint itself is not violated, however,unless this change takes the counter out of its legal range, so we must augment theProlog fragment with an appropriate check for this. If the constraint is violated, wemust display an error message, and if it is not then we must adjust the value of theconstraint counter accordingly. Updates which create new data will require the counterto be incremented and updates which delete data will require it to be decremented.The general form of the Prolog generated for updates which may violate thiskind of constraint, then, is:Body = (ConstraintProlog,get constraint counter(CId, CounterValue),(CounterValue = Limit ->constraint error(CId, : : :); increment/decrement constraint counter(CId),fail).

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 81where ConstraintProlog is the code fragment generated from the constraint ICode.For those updates which can never violate the constraint, such as additions forat least constraints and deletions for at most constraints, we do not need to check theconstraint counter's value, although we may need to adjust its value. The general formin this case is:Body = (ConstraintProlog,increment/decrement constraint counter(CId),fail).For example, constraint c4 can only be violated by an addfnval-type update to theposition(staff) function, but its counter may be altered by deletefnval events onposition(staff) or deletentity events on staff. The following three fragments,therefore, are generated to check this constraint:c4 addfnval1([S], P) :-P = professor,get constraint counter(c4, Counter),(Counter = 5 ->constraint error(c4, addfnval(position, [S], P)); increment constraint counter(c4),fail).c4 deletentity([S]) :-getfnval(position, [S], P),P = professor,decrement constraint counter(c4),fail.c4 deletefnval([S], P) :-P = professor,decrement constraint counter(c4),fail.

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 823.4 Constraints in the Protein DatabaseIn the �nal section of this chapter, we show how the language described above canbe used to de�ne integrity constraints on the protein database and thereby illustratesomething of its expressive power. These constraints fall into three basic categories:� constraints which reduce the allowable domain of an attribute from that speci�edby its type,� constraints which guarantee the consistency of duplicated data, and� constraints which describe biochemical \rules" about protein structure.There are many constraints on the protein data, especially of the �rst two types, andwe give only a representative selection here. All of the constraints identi�ed can beexpressed in the full constraint language, although not all can be expressed in thesubset of the language that is currently implemented. In what follows, those constraintswhich can be supported by the current system are given in the teletype font, whereasthose which cannot are given in italics.There are several attributes in the protein database schema that store numer-ical properties of the data; for example, the molecular weight/1 and resolution/1attributes of the class protein, and the molwt/1 and accessible area/1 attributes ofthe class chain. All these attributes are de�ned as either integers or floats, whereasin fact they have much smaller domains, since none of them can take a negative value.We can use semantic constraints to reduce the domains of these attributes as follows:constrain each p in proteinso that molecular weight(p) > 0;constrain each p in proteinso that resolution(p) > 0;constrain each c in chainso that molwt(c) > 0;

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 83We can further constrain the molecular weight of proteins since we know that thisvalue must be either equal to or greater than the sum of the molecular weights of itsconstituent chains:constrain each p in proteinso that molecular weight(p) �sum(molwt(component protein inv(p) as chain));We cannot say that the protein's molecular weight should always equal the sum of theweights of its chains, as it may contain other chemical groups (i.e. ligands) for whichthe molecular weight is not recorded.For some attributes we can specify an upper as well as a lower bound. Thepos/1 function de�ned on residue gives us an example of this kind of constraint. Thisfunction gives the position of a particular residue along a chain, with the �rst residuebeing at position 1, and the remainder being numbered consecutively according to theirorder of occurrence in the chain. The num residues(chain) attribute stores a count ofthe residues within each chain, and therefore provides us with an upper bound for thepos(residue) function:constrain each r in residueso that pos(r) � 1 andpos(r) � num residues(has component(r) as chain);For extra security, we can make sure that the number of residues per chain is recordedaccurately by the following constraint:constrain each c in chainexactly num residues(c) r in residueto have has component(r) as chain = c;Several of the relationship functions in the protein database represent symmetri-cal relationships, and we can enforce this essentially structural property using integrityconstraints. The parallel/1 function, for example, de�ned on the class strand relatesparticular strand instances to those of their neighbouring strands which run in a par-allel direction to them. Similarly, antiparallel/1 is a function relating neighbouringstrands that are antiparallel to each other. The symmetry of both these relationshipscan be expressed as:

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 84constrain each s in strandso that s in parallel(parallel(s));constrain each s in strandso that s in antiparallel(antiparallel(s));We can use similar constraints to express the fact that no strand is both parallel andantiparallel to another strand3:constrain no s in strandto have s in parallel(antiparallel(s));constrain no s in strandto have s in antiparallel(parallel(s));and that every strand is either parallel or antiparallel to at least one other strand(i.e. strands cannot exist in isolation):constrain each s1 in strandso that some s2 in strand has parallel(s1) = s2 or antiparallel(s1) = s2;The secondary structure hierarchy and its related functions o�er scope for severalinteresting integrity constraints. We can, for example, state that the first structureof a particular chain is the secondary structure element that is preceded by no otherelement in that chain:constrain each s in structure such thatno s2 in structure has follows(s2) = sto have �rst structure(structure chain(s)) = s;Similarly, we can express the fact that the last structure is that which has no succeedingelement as the constraint4:constrain each s in structure such thatno s2 in structure has follows inv(s2) = sto have end(s) = num residues(structure chain(s));3Notice that this formulation of the constraints relies on the knowledge that these relationships areboth symmetrical.4Since there is no explicit record of the last structure of a chain, we identify it as the secondarystructure element which ends with the last residue.

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 85A further constraint on secondary structure is that there should be no gaps or overlapsin the assignment between the �rst structure and the last:constrain each s in structureso that end(s) = start(follows(s)) - 1;constrain each s in structureso that start(s) = end(follows inv(s)) + 1;Both these constraints illustrate one advantage of a weak translation in allowing a morenatural and concise formulation of constraints. With a strong translation, we wouldhave to state explicitly that these constraints are not applicable to the �rst and laststructures, which will have a gap either before or afterwards:constrain each s in structure such thatsome s2 in structure has follows inv(s) = s2so that start(s) = end(follows inv(s)) + 1;A weak translation on the other hand allows this kind of information to be speci�edimplicitly, resulting in a more concise constraint formulation.The protein database contains several examples of constraints that maintain theconsistency of duplicated data. This usually takes the form of a single function that ab-breviates a relationship between two classes formed by a sequence of other relationships.We give a couple of examples here to illustrate the general form that this type of con-straint takes. The �rst maintains the consistency of the residue structure(residue)function, which materialises the relationship between residues and the element of sec-ondary structure to which they have been assigned:constrain each r in residueso that pos(r) >= start(residue structure(r)) andpos(r) =< end(residue structure(r));The second maintains consistency of the cycle of relationships linking �-sheets to theprotein in which they occur:constrain each s in sheetso that sheet protein(s) incomponent protein(structure chain(strand sheet inv(s)));

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 86We can use similar constraints to enforce the integrity of the two index functionsabsolutepos(chain, integer) and res by name(chain, string):constrain each r in residueso that absolutepos(has component(r) as chain, pos(r)) = r;constrain each r in residueso that res by name(has component(r) as chain, name(r)) = r;The importance of the kinds of constraint illustrated here must not be under-estimated. Mistakes can and do �nd their way into any sizeable database | especiallywhen, as in the case of the protein database, the data passes through several handsand formats before it enters the database. An error recently located in the proteindatabase, for example, caused several of the largest proteins stored to have apparentmolecular weights of less than that of the smallest protein. The problem was causedby the leftmost digit of the molecular weight �eld being truncated by the program thatprepared the data for input into P/FDM | an error that a simple constraint such asthat given earlier on the molecular weight of proteins could have trapped immediately.Finally, we consider the possibilities for using integrity constraints to expresssome of the biochemical semantics of the protein structure domain. While there aremany potential constraints on the structure of proteins, not all of these are suitable foruse as integrity constraints. It would not be practical, for example, to check all thepossible constraints on the distances between bonded atoms, or the values of all the �and torsion angles along a chain, every time a new atom instance was created. Noris it clear that such checks would be useful given that atoms are rarely manipulatedindividually. A better way to check this kind of low-level constraint is to use one of thespecial-purpose programs that have been developed for identifying potential anomaliesin protein structure data. Procheck [70], for example, is a suite of programs that anal-yses the structure of an individual protein (input as a �le in the standard Brookhaven�le format [8]) and gives some estimate of its stereochemical quality. The programs alsoproduce several graphical representations of the protein which can highlight \problemregions" within the structure. Programs like this can be used, therefore, to complementan integrity constraint mechanism by taking on the responsibility for checking the im-portant atom- and residue-level constraints, which can involve a signi�cant amount of

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 87computation and would have a serious e�ect on system performance if checked with in-tegrity constraints. The DBMS is then left with the much simpler task of enforcing thekind of constraints speci�ed above, and of any less complex or higher-level biochemicalconstraints.The constraints that describe the semantics of disulphide bridges are good ex-amples of the latter type of constraint:1. a disulphide bridge forms a symmetric relationship between two sub-componentsconstrain each sc in sub componentso that disulphide(disulphide(sc)) = sc;2. the only type of natural residue that can participate in disulphide bridges is cys-teine: constrain each r in residue such thatsome sc in sub component has disulphide(sc) = rso that name(r) = \cys";We cannot express this constraint over all sub-components, because there is apossibility that a synthetic group may be able to form a disulphide bridge withsome other sub-component. (Synthetic groups are modelled as sub componentinstances that are not also instances of residue.)3. if two residues are joined by a disulphide bridge, their sulphur atoms must be closeenough to be covalently bonded:constrain each r in residueso that distance(atom(r, \sg"), atom(disulphide(r), \sg"))< 3.7;where 3.7 is the sum of the van der Waals radii for two atoms of sulphur (givenby Pauling [89] as 1.85�A). In fact, the van der Waals radius is the radius of anunbound atom, and covalently bonded atoms will in reality have a much smallerradius (called the covalent radius). Pauling, for example, gives the value of thesingle bond covalent radius of sulphur atoms as 1.04�A. However, this value is anaverage measure for the bound radius, whereas we require the maximum distance

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 88that might separate two bound sulphur atoms, which is given by the sum of thevan der Waals radii.As in the case of the previous constraint, the scope of this constraint is limited bythe possibility of synthetic groups being involved in the disulphide, whose sulphuratom may have some name other than \sg" (the standard name for the sulphuratom of cysteine).All of these constraints are simple to check, with only the �nal constraint involving anycomputation (and even this would occur only when the residue involved in the updatewas actually involved in a disulphide bridge.)It is possible to give similar constraints for the other non-covalent interactionscommon in proteins (hydrogen bonds and salt bridges), although they are not so strictlyconstrained as the disulphide bonds and therefore result in more complex constraints.A salt bridge, for example, is an interaction between charged sub-components, and wecould therefore specify the following constraints on natural residues:constrain each sb in salt bridge such that pos source(sb) is a residueso that name(pos source(sb)) in f\lys", \his", \arg"g;constrain each sb in salt bridge such that neg source(sb) is a residueso that name(neg source(sb)) in f\asp", \glu"g;Unfortunately, there is an exception to this constraint in that the ends of protein chainsalso carry charge, and may therefore form chemically legitimate salt bridges that violateit. It is possible to avoid some of the problems caused by \exceptions" to constraints bytaking care that the speci�cation of the conditions under which the constraint is valid isaccurate, as we did with the constraints on disulphide bridges. For instance, we couldmodify the above constraints on salt bridges so that they applied only when the sourcebeing constrained was not an end-residue.Another way to deal with exceptions is to weaken the constraint in some way, asis shown by the following example involving hydrogen bonds in �-helices. The classic�-helix conformation is stabilised along its length by hydrogen bonds, linking eachresidue to the residue three places along the chain. We can express this information asa constraint on �-helices:

CHAPTER 3. INTEGRITY CONSTRAINTS IN P/FDM 89constrain each a in alphaeach r in residue structure inv(a) such thatpos(r) in fstart(a) to end(a) - 3gso that some h in hbond hasacceptor(h) = r andpos(donor(h)) = pos(r) + 3 andacceptor atom(h) = \o" anddonor atom(h) = \n";The conformation usually adopted by helical sections of natural proteins, however, dif-fers slightly from the classical �-helix conformation in a way that allows the carbonyloxygen of each residue to also form hydrogen bonds with donors from outwith the helix[26]. In order to allow this type of helix to be stored in the database, we must modifyour original, strong constraint to the following, weaker version:constrain each a in alphaeach h in hbond such thatresidue structure(acceptor(h)) = a andresidue structure(donor(h)) = a andacceptor atom(h) = \o" anddonor atom(h) = \n"so that pos(donor(h)) and pos(acceptor(h)) + 3;Where our �rst constraint speci�ed that all �-helices have a full set of hydrogen bondsstabilising their conformation, our second version requires only that the backbone hy-drogen bonds which exist within the helix link the appropriate residues.As can be seen from the above examples, the constraints that are extracted froma complex application domain are themselves complex and di�cult to specify correctly.In such applications, we believe that exibility of access to constraint metadata, and inparticular the ability to delete and modify constraints from a working database, is vitalif application developers are to be able to make con�dent use of an integrity constraintmechanism. Our architecture gives us this exibility. However, it is still important tomonitor the e�ects of the integrity constraints on system performance and to considerother approaches to the maintenance of consistency (e.g. using programs like Procheck)if run-time execution times are found to su�er too badly.

Chapter 4Constraints and Transactions inP/FDMWhile an integrity constraint mechanism such as that described in the previous chaptercan protect users frommaking illegal updates to their databases, it can also under certaincircumstances prevent quite legitimate updates from taking place. This problem ariseswhen the user wishes to make a sequence of updates (called a composite update) thattogether represent a legal alteration of the data, but which individually violate oneor more of the integrity constraints. For example, consider an update recording theinformation that strand(7) and strand(23) are parallel to each other. The completeupdate requires two calls to addfnval/3, each creating the relationship in a particulardirection:addfnval(parallel, [strand(7)], strand(23)),addfnval(parallel, [strand(23)], strand(7)).The combined e�ect of these two updates is a legal change to the database; the inter-mediate state, however, which is created when only one of the updates has been made,violates the symmetry constraint on the parallel relationship, given in Section 3.4:constrain each s in strandso that s in parallel(parallel(s));For some composite updates, it may be possible to avoid this type of problem by acareful ordering of the individual update operations, but in general, as in our example90

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 91here, there may be no ordering which does not involve the creation of some invalidintermediate state. One solution is to provide some mechanism whereby the DBMSallows the user to violate the constraints temporarily, on the understanding that integritywill be restored at the end of the composite update. A transaction mechanism is an idealbasis for providing this kind of behaviour: the transaction boundaries clearly delimit theperiod when constraints can be violated, and the atomicity property of transactions canbe exploited to ensure that integrity is not jeopardised in the event of the user failingto keep to her/his side of the agreement.We have implemented a simple transaction mechanism in P/FDM that allowsthe suspension of integrity constraint checking during composite updates. The imple-mentation is based on the idea of a di�erential �le, whereby the updates that are madeduring a transaction are represented by the set of data that they have added to theinitial (i.e. pre-transaction) database, and the set of data that has been deleted. Thestate of the transaction at any point within the transaction is given via a view onto thesesets and the initial database state. The contents of these two sets are managed so that,at any one time, they represent the minimal set of updates required to duplicate thee�ects of the transaction so far. This is similar to the technique of consolidated logging[15] in which only \e�ective updates" (i.e. those which result in some de�nite changeto the initial database state) are recorded in the transaction log. Consolidated loggingis a useful technique for transactions, since it improves the e�ciency of the commit-process by removing the need to commit redundant updates. In our implementation,the minimality of the di�erential sets not only improves the e�ciency of the commitmentand constraint checking processes, but it also has the much more important e�ect ofremoving the possibilities of ambiguities in the interpretation of the transaction data1.In keeping with the type of application for which P/FDM is designed, we havetaken a exible approach to the speci�cation of transactions. In conventional databasesystems, transactions are speci�ed by the database designer using some special purposelanguage, and are presented to the user as a set of \safe" update primitives. Whilethis may be suitable for traditional transaction processing applications, the users ofdesign applications need more exibility and control in updating their databases than1Ambiguities arise in non-minimal representations because the changes to the original database arerecorded as sets, i.e. the ordering information is not preserved.

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 92the prede�ned transaction approach will allow [85]. In order to provide this exibilityin P/FDM, we consider a transaction to be speci�ed at run-time, by the user, and toconsist of whatever update operations take place between the start and the end of thetransaction2. This gives the user the freedom to use their own judgement as to the mostappropriate update action that is to be taken at any point in the transaction. In otherwords, each transaction is tailor-made to suit the precise modelling task that is in hand.The implementation of the transaction mechanism consists of the de�nition ofseveral new primitives for manipulating transactions, and the provision of new internalde�nitions for the data manipulation primitives that describe the behaviour of a newmodule type called transaction. These de�nitions maintain the two sets of data rep-resenting the transaction updates and provide a view onto them for data retrieval. Thedi�erential sets are actually stored as two temporary modules | the add-module andthe delete-module. This approach simpli�es the implementation of the transactionmechanism by allowing us to reuse the existing temporary primitives to handle thedata storage.The implementation also provides a very cheap mechanism for aborting trans-actions, which means that it is also suitable for handling hypothetical transactions [44].A hypothetical transaction is one in which the user makes an experimental update (orset of updates) in order to answer \what if?"-type queries, but which is not intended asa permanent change to the database. The existence of an inexpensive abort primitiveencourages users to experiment with more complex updates, without having to worryabout their underlying database being corrupted.The remainder of this chapter is organised as follows: Section 4.1 gives anoverview of P/FDM's transaction mechanism, by describing the processing of a sim-ple example transaction; Section 4.2 describes the new primitive de�nitions that arerequired to implement it; Section 4.3 discusses the details of the commitment process,including the checking of integrity constraints; and �nally Section 4.4 summarises ap-proaches to the checking of integrity constraints within transactions taken by othersystems.2This style of transaction is sometimes referred to as a user-controlled transaction, as opposed to themore traditional programmed-style of transaction [62].

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 9330 31 58Alpha 'h2' 59(a)30 31 45Alpha 'h2' 46Loop 'l25'47 58Alpha 'h20' 59(b)Figure 4.1: Modifying the structure of a chain (a) before the update, and (b) after theupdate4.1 An Overview of the Transaction MechanismWe will illustrate the overall behaviour of our transaction mechanism by consideringthe progress of a simple transaction in the protein database, from the point of view ofboth the user and the DBMS. Suppose that a section of protein chain is to have itssecondary structure assignment adjusted (to account for some mistake in the originalassignment, say) so that one of the helical sections is split into two helices divided by ashort section of random coil (see Figure 4.1). In the process of making this compositeupdate, several of the constraints on the secondary structure classes will be violated,so it must therefore be performed within a transaction. The user begins by asking theDBMS to initiate a new transaction:j ?- begin transaction.This primitive creates a special metadata marker, the presence of which informs therest of the system that the special transaction behaviour is required, and that integrityconstraint checking should be suspended. It then creates two empty temporary moduleswhich are to serve as the add- and delete-modules for the transaction (i.e. they willcontain the details of the updates that are made during the transaction). These modulesare \empty" in the sense that no classes or functions have yet been de�ned within them.

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 94The �rst stage of the composite update is the creation of the new loop andalpha instances. For each update, we give both the Daplex and Prolog versions since,while the Daplex is more readable, the Prolog versions show the exact updates whichare occurring more clearly.Daplex program to create the new loop instance:create a new l in loop with key = (òval', ` ', `l25'),let start(l) = 46,let end(l) = 46;Prolog program to create the new loop instance:j ?- newentity(loop, [oval, ' ', l25], Loop),addfnval(start, [Loop], 46),addfnval(end, [Loop], 46).Daplex program to create the new alpha helix instance:create a new a in alpha with key = (òval', ` ', `h20'),let start(a) = 47,let end(a) = 58;Prolog program to create the new alpha helix instance:j ?- newentity(alpha, [oval, ' ', h20], Helix),addfnval(start, [Helix], 47),addfnval(end, [Helix], 58).Since we are within a transaction, these updates must be diverted away from the \real"module (pdb1 in this example) and into the add-module. In order to do this, the DBMScreates an extension of each class a�ected by the update within the add-module, andthen uses the temporary version of the newentity primitive to make the actual updatewithin this extension. Notice that the DBMS must also ensure that all propagatedupdates, which here include the de�nition of key functions and the creation of super-instances, are also made within the transaction module. However, in most cases, we canrely on the temporary primitive to make these propagated updates for us. The state ofthe transaction modules at this stage in the update is shown in Figure 4.2. We have madeno deletions as yet, so the delete-module is empty, whereas the add-module contains the

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 95add-moduledelete-module

addstructure(400)addloop(400)structure name= l25start = 46end = 46 addstructure(401)addhelix(401)addalpha(401)structure name= h20start = 47end = 58

Clause BaseDatabasefragment of module pdb1chain(87)num residues= 384 proteincomponent(87)Figure 4.2: The state of the transaction modules after entity creation

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 96two new instances, both of which are linked by the structure chain function to theexisting chain(87) instance in the underlying, disk-based module.The next stage of the update shortens the original helix so that it ends where thenew loop section begins. This will resatisfy the constraint that no secondary structureelements may overlap, which had been violated by the above updates.Daplex program to shorten helix \h2":let end(the a in alpha such thatprotein code(component protein(structure chain(a))) = òval' andcomponent id(structure chain(a)) = ` ' andstructure name(a) = `h2') = 45;Prolog program to shorten helix \h2":j ?- getentity(alpha, [oval, ' ', h2], Helix),updatefnval(end, [Helix], 58, 45).The user �rst retrieves the existing alpha instance, using its key values, and then altersthe value of the end function. Although the retrieval primitives do not themselves makeany updates, they are responsible for providing the view onto the special temporarymodules, and therefore also have special transaction de�nitions. Within a transaction,the extent of a class C is given by:AddC [ (C - DelC)where AddC is the extension of C within the add-module, and DelC is the extensionof C within the delete-module. Function values are handled in the same way. Thetransaction de�nition of the getentity/3 primitive, then, �rst searches the add-modulefor an instance with the required key, and then searches the underlying module, �lteringout any matching instances that are also present in the delete-module.Function value updates are treated as a deletion followed by an addition, so thatthe above call to updatefnval/4 is equivalent to:j ?- deletefnval(end, [Helix], 58),addfnval(end, [Helix], 45).

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 97as far as the behaviour of the transaction mechanism is concerned. Function deletionis handled by adding the old function value to the end(delete structure) functionin the delete-module, and the function addition by adding the new function value toend(add structure) in the add-module. If we were to retrieve the value of the endfunction for this helix now, it's result would be given by:end(AddHelix) [ (end(Helix) - end(DelHelix))� f45g [ (f58g - f58g) � f45gFigure 4.3 illustrates the contents of the transaction modules a�ected by this update.At this point, the composite update is complete, and the user asks the DBMS tocommit the changes and resume integrity checking:j ?- commit transaction.Before this can happen, however, the DBMS checks that no integrity constraints havebeen violated by the transaction. In this case, the constraint on the residue structurefunction:constrain each r in residueso that pos(r) >= start(residue structure(r)) andpos(r) =< end(residue structure(r));is violated by the shortened helix. When the DBMS detects this violation, it resumesthe execution of the current transaction and gives an error report of the constraints thathave been violated. The user now has two courses of action. Either the changes madeso far can be abandoned:j ?- abort transaction.in which case the DBMS simply closes down the two transaction modules, and removesthe transaction metadata marker, leaving the database in exactly the same state asbefore the transaction was begun.

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 98fragment of the add-moduledelete-moduleend(structure(198)) = 58end(structure(198)) = 45Clause BaseDatabasefragment of module pdb1structure(198)structure name= h2start = 31end = 58 helix(198)Figure 4.3: The contents of the transaction modules after function update

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 99Alternatively, the user might attempt to restore the integrity of the database bymaking further updates. The appropriate action in this case is to delete the erroneousrelationships between the shortened helix and its original residues:Daplex program to delete the residue structure relationships:for the s in structure such thatprotein code(component protein(structure chain(s))) = òval'component id(structure chain(s)) = ` ' andstructure name(s) = `h2'for each r in residue structure inv(s)exclude s from residue structure(r);Prolog program to delete the residue structure relationships:j ?- getentity(structure, [oval, ' ', h2], Helix1),( between(46, 58, Pos),getentity(residue, [oval, ' ', Pos], Res),deletefnval(residue structure, [Res], Helix1),fail; true).Daplex program to set the correct residue structure for the new loop:let residue structure(the r in residue such thatprotein code(component protein(has component(r))) = òval'component id(has component(r)) = ` ' andpos(r) = 46) = the s in structure such thatprotein code(component protein(structure chain(s))) = òval' andcomponent id(structure chain(s)) = ` ' andstructure name(s) = `l25';Prolog program to set the correct residue structure for the new loop:j ?- getentity(structure, [oval, ' ', l25], Loop),getentity(residue, [oval, ' ', 46], Res),addfnval(residue structure, [Res], Loop).

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 100Daplex program to set the correct residue structures for the new helix:for the s in structure such thatprotein code(component protein(structure chain(s))) = òval'component id(structure chain(s)) = ` ' andstructure name(s) = `h20'for each r in residue such thatprotein code(component protein(has component(r))) = òval'component id(has component(r)) = ` ' andpos(r) in 47 to 58let residue structure(r) = s;Prolog program to set the correct residue structures for the new helix:j ?- getentity(structure, [oval, ' ', h20], Helix2),( between(47, 58, Pos),getentity(residue, [oval, ' ', Pos], Res),addfnval(residue structure, [Res], Helix2),fail; true).The user now requests a second attempt to commit the transaction and, since all con-straints are now satis�ed, this time the commit is successful. The DBMS copies all thechanges across to the underlying disk-based modules and closes down the two transac-tion modules so that \normal" behaviour can resume.The advantage of this approach to transaction management for P/FDM is thatit is very well integrated with the existing system. Since the transaction mechanismis based upon the behaviour of the standard data manipulation primitives, it is auto-matically accessible to users of any of the higher-level interfaces to P/FDM. Any of theupdates described above, for instance, could have been speci�ed in Daplex rather thanProlog, with exactly the same results.The use of the existing temporary storage format brings other advantages. Thereuse of existing routines that this allows means that the de�nitions of the transactioninternal primitives are much simpler, more reliable and easier to maintain. It also meansthat the DBMS can interrogate the current state of a transaction at any time using theordinary data retrieval primitives, which is useful both for constraint checking and forcommitting updates.

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 1014.2 The Transaction MechanismLike the implementation of the integrity constraints mechanism described in Chap-ter 3, the introduction of transactions into P/FDM requires the de�nition of severalnew primitives for manipulating the new data model concept. Unlike the constraint im-plementation, however, the transaction mechanism also requires the creation of a newmodule type (called transaction) and must therefore provide new internal de�nitionsfor all the existing data manipulation primitives, describing their behaviour within atransaction.4.2.1 The Transaction PrimitivesIn P/FDM, transactions are viewed as a behavioural data model concept rather similarto the concept of modules. Like modules, transactions may be opened and closed, andtheir purpose is to encapsulate an ad hoc sequence of data manipulation operationswithin a particular semantics. Unlike modules, however, which are intended to beaccessed simultaneously, our current implementation allows only one transaction to beactive at any one time.Apart from the processes of constraint checking and update commitment, thethree transaction primitives can be de�ned succinctly in terms of the existing modulemanagement primitives. The internal de�nition of the begin transaction primitive,for example, is:% internal begin transaction(transaction)internal begin transaction(transaction) :-% Check that there are no other transactions active.(transaction active(OtherType, , ) -> % 1% If there are then print an error messageerror('There is already an active transaction, of type',OtherType);

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 102% else start up a new transaction% Create the two (in-memory) transaction modulesnew transaction module names(AddMod, DelMod),new module(mdesc(AddMod, temporary, temporary, 0, 0, 0)),% 2new module(mdesc(DelMod, temporary, temporary, 0, 0, 0)),% 3% Record the transaction module namesassert(transaction active(transaction, AddMod, DelMod))% 4).The primitive begins by checking that no other transaction is active, by searching themetadata for the special transaction marker transaction active/3 (line 1). This termis the only new piece of metadata required for transactions, and it has the form:transaction active(TransactionType, AddModuleName, DelModuleName).whereTransactionType describes the transaction semantics required for the newtransaction. In the current implementation only one semantics is avail-able and this argument must always take the value transaction.AddModuleName is the name of the add-module for the currently active trans-action, andDelModuleName is the name of the corresponding delete-module.If no other transaction is active then we can create the two temporary modules forthe current transaction (lines 2 and 3), using the standard primitive for creating newmodules (new module/1 | see Appendix A). The name of the transaction modules areformed by adding an integer su�x to the atoms add module and del module to createtwo unique atoms. Even though we expect there to be only one active transaction atany one time, we generate new transaction module names for each transaction. Thismeans that we can be sure that we are starting with \clean" transaction modules eachtime, and the user is in some measure protected from system errors which do not clearout transaction data completely at commit- or abort-time. Finally, once the transaction

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 103modules have been created, we put a new transaction active term into the metadata,recording the names that have been generated for them (line 4).The de�nition of the abort transaction/0 primitive is even more straightfor-ward, requiring only that the two transaction modules be closed down (using the stan-dard close module/1 primitive) and that the transaction metadata marker be removed:% internal abort transaction(transaction, +AddMod, +DelMod)internal abort transaction(transaction, AddMod, DelMod) :-% Close the two transaction modulesclose module(AddMod),close module(DelMod),% Remove the transaction descriptorretract(transaction active(transaction, AddMod, DelMod)).The commit transaction/0 primitive, is the most complicated of the three trans-action primitives. It behaves like abort transaction/0 except that, in addition toclearing out the transaction information, it also checks that all the integrity constraintsare still satis�ed and copies the updates described by the transaction modules to the un-derlying disk-based modules. These two processes are described in detail in Section 4.3.4.2.2 Data Manipulation Under TransactionsIn order to provide the correct update and retrieval behaviour within transactions, wemust provide a new internal de�nition for each of the data manipulation primitives, evenif they are not directly involved in the special transaction behaviour. In most cases,however, as with the transaction manipulation primitives described above, we can giveconcise de�nitions by reusing the existing temporary primitives to handle the actualdata manipulation.Retrieving Data Under TransactionsAs we saw earlier, the true extent of a class (or function) that has been updated within atransaction is given by its extent immediately prior to the transaction, plus the instancesthat have been created in its equivalent class (or function) in the add-module (the add-class), and less the instances in its equivalent class (or function) in the delete-module

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 104(the delete-class). We can provide the user with this view onto the transaction modulesusing the following algorithm:1. First retrieve all the instances (or function values) from the appropriate add-class(add-function),2. Next retrieve all instances (or function values) from the original class (function),ignoring any which are also present in the delete-class (delete-function).The following (slightly simpli�ed) internal de�nition for the getentity/2 retrieval prim-itive illustrates how this algorithm can be expressed in terms of existing primitives:% internal getentity(transaction, +EDesc, +EName, ?Inst)internal getentity(transaction, EDesc, EName, Inst) :-% Construct the metadata descriptors for the classesEDesc = edesc(EName, , , , , , UndMod),AddEDesc = edesc(AddEName, , , , , , AddMod),DelEDesc = edesc(DelEName, , , , , , DelMod),% Find the names of the transaction classestransaction active(transaction, AddMod, DelMod),transaction name(AddMod, EName, AddEName),transaction name(DelMod, EName, DelEName),( % Retrieve from the add-classAddEDesc, % 1internal getentity(temporary, AddEDesc, AddEName, Inst); % Retrieve from the underlying classfind true module type(UndMod, TrueModType),internal getentity(TrueModType, EDesc, EName, Inst),% Check that this instance has not been deleted(DelEDesc, % 2internal getentity(temporary, DelEDesc, DelEName, Inst) ->fail; true)).

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 105There are three points to notice about this de�nition. Firstly, the name of the add-and delete-classes corresponding to a class called EName are generated by pre�xingthe name of the appropriate transaction module to EName. For example, if the cur-rent transaction modules are add module6 and delete module6 then the transactionclasses associated with the protein class will be called add module6 protein anddelete module6 protein. Although the class names generated in this way may seemunwieldy, it should be remembered that they are internal names only and should neverbe seen by the user under normal circumstances.Secondly, this internal primitive is de�ned wholly in terms of the internal de�ni-tions for existing storage types and therefore makes no assumptions about any particularstorage schema. Not only does this simplify the internal de�nition of the primitive, butit also signi�cantly improves its maintainability, since it is insulated from any changesto the temporary module storage format. Finally, notice that each retrieval from eitherof the transaction modules is \guarded" by a search for the metadata descriptor of theappropriate transaction class (lines 1 and 2). While the primary purpose of this is tobuild a completely instantiated descriptor for use by the internal retrieval primitives,it also has the added bene�t of preventing retrievals from the transaction classes whenno update to the associated disk class has yet been made. Thus the overhead of dataretrieval under transactions is considerably reduced for classes or functions which areuna�ected by the transaction updates.Creating Instances and Function ValuesThe creation of a new instance or function value within a transaction is representedby the creation of a corresponding instance or function value within the currentlyactive add-module. The job of the transaction de�nitions of the creation primitives(newentity/3 and addfnval/3), then, is to divert update requests to this module. Aswith the retrieval primitives, these de�nitions are considerably simpli�ed by the reuseof the temporary internal primitives. Here, for example, is the transaction de�nition ofnewentity/3:

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 106% internal newentity(transaction, +EDesc, +EName, +KeyList, -Inst)internal newentity(transaction, EDesc, EName, KeyList, Inst) :-EDesc = edesc(EName, , , , , , ),AddDesc = edesc(AddName, , , , , , AddMod),% Fetch the details of the current transaction moduletransaction active(transaction, AddMod, ),transaction name(AddMod, EName, AddName),% Make sure that all the necessary descriptors have been% copied, ready to receive the new object.create newentity descriptors(AddMod, EDesc, AddDesc), % 1% Create the instance in the add-moduleinternal newentity(temporary, AddDesc, AddName, KeyList, Inst).% 2Notice that before we can create the instance (using the temporary internal de�nitionof newentity/3, line 2) we must ensure that the add-class has been de�ned within theadd-module (line 1). If it has not, then we must de�ne it by creating an appropriatemetadata descriptor. This descriptor is, for the most part, a copy of the correspondingdescriptor in the underlying module except that:� the class name is the newly-formed add-class name� the name of the superclass is also the name of an add-class� the module of de�nition is the add-moduleThus, if we create a new instance of the class chain with the descriptor:edesc(chain, protein component, string,[foreign(component protein), component id], 204, 223, pdb1).within a transaction whose add-module is called add module6, the following descriptorwill be created:edesc(add module6 chain, add module6 protein component,[foreign(component protein), component id], 204, 223,add module6)

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 107In addition to copying the descriptor for this class, we must also copy the descrip-tors for its superclasses, its key functions and for any classes on which it is key-dependent| in short, all the classes and functions to which the creation will be propagated. Thus,the schemas of the transaction modules are continually being extended as updates tonew classes and functions occur.The behaviour of addfnval/3 under transactions is much the same as that ofnewentity/3, except in one particular respect. In order to provide an unambiguousrepresentation of a transaction, it is important that the set of data stored within thetransaction modules represents the minimal set of updates required to duplicate itse�ects. For example, if the underlying module contains the following function mappingimmediately prior to the transaction:res by name(chain(27), pro) ->fresidue(1837), residue(1836), residue(1818)gand the add- and delete-modules both contain the mapping:res by name(chain(27), pro) -> fresidue(1836)gthen it is unclear whether the result of the transaction should be to leave the function'sresult unchanged, or whether residue(1836) should be deleted. The correct interpre-tation depends on which of the two updates recorded in the transaction modules wasmade last. A minimal representation of either of the possible interpretations, however,would contain the update in only one of the transaction models, and would thereforebe completely unambiguous.We can ensure that we store only the minimal representation of an update bythe simple strategy of always attempting to represent it by deleting data from thetransaction modules, before we resort to adding any new data. For example, considerthe res by name mapping given above. If the user deletes residue(1836) from theresults set, then this mapping is added to the delete-module:res by name(chain(27), pro) -> fresidue(1836)gIf the user subsequently replaces this residue in the mapping, then a minimal represen-tation of this sequence of updates can be obtained by deleting the mapping from thedelete-module, rather than adding it to the add-module.

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 108Before we add any data to the add-module, then, we must �rst check that wecould not have achieved the same e�ect by deleting some data from the delete-module;and before we add any data to the delete-module, we must check whether a deletionfrom the add-module would not have been su�cient. The (simpli�ed) de�nition of thetransaction version of addfnval/3 illustrates how we can implement this behaviour forfunction value additions:% internal addfnval(+MType, +FDesc, +Function, +Arguments, +Result)internal addfnval(transaction, FDesc, FName, ArgList, Result) :-FDesc = fdesc(FName, [FirstArg j ], , , , , , Module),AddFDesc = fdesc(FName, [AddFArg j ], , , , , , AddMod),DelFDesc = fdesc(FName, [DelFArg j ], , , , , , DelMod),% Fetch the details of the current transactiontransaction active(transaction, AddMod, DelMod),transaction name(AddMod, FirstArg, AddFArg),transaction name(DelMod, FirstArg, DelFArg),% If the result exists in the deletion module then ...(DelFDesc,internal getfnval(temporary, DelFDesc, FName, ArgList, Result) ->% ... 'undelete' itinternal deletefnval(temporary, DelFDesc, FName, ArgList,Result); % Otherwise, add the function to the addition module% Make sure that all the necessary descriptors have been% copied, ready to receive the new function valuesFDesc,create addfnval descriptors(AddMod, FDesc, AddFDesc),% Finally, add the new function value in the add-module.internal addfnval(temporary, AddFDesc, FName, ArgList, Result))).Why does the newentity/3 internal de�nition not concern itself with the avoidanceof ambiguities? The reason for the more straightforward de�nition of this primitive isthat we know that it can never create an instance which might be present in the delete-module, because new instances are always created with a completely new identi�er |even when they have the same key as some deleted instance.

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 109Deleting Instances and Function ValuesThe deletion primitives, deletentity/2 and deletefnval/3, are the most complexof all the primitives to handle under transactions, because of the complexity of thepropagations that each of them requires. This problem is compounded by the fact thatdeletions are actually represented by creations within a transaction, and must thereforebe implemented by an internal de�nition that has the propagation semantics of deletionbut the update e�ects of creation. Since the temporary creation primitives all have thepropagation semantics of creation, we cannot use them to simplify the de�nitions of thedeletion transactions so completely as we have been able to do for the other primitives.As an added complication, we must also remember to store the minimal representationfor each update by deleting data from the add-module wherever possible. For thisoperation, however, we are able to use the temporary primitives, since we require adeletion update operation with the semantics of deletion.The same basic structure is used for both deletion primitives, and we illustrateit here by giving the top-level de�nition of the deletentity/2 primitive:% internal deletentity(transaction, +EDesc, +EName, +EInst)internal deletentity(transaction, EDesc, EName, EInst) :-transaction active(transaction, AddMod, DelMod),% Build the list of objects that must be deleted as a% result of the deletion of EInst.things to be deleted(EDesc, EInst, ThingsToDelete),% Split the list into things which are in the add-module% and which are in the underlying modules.split things to delete(ThingsToDelete, AddMod, AddThings,UndThings),% Delete those in the add-module.delete from add module(AddMod, AddThings),% And add the remainder to the delete-module.add to deletion module(DelMod, UndThings).The primitive begins by building a list (ThingsToDelete) of all the individual instancesand function mappings that must be deleted as a result of the deletion of the given

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 110instance (according to the propagation rules described in Chapter 2). The routine whichimplements this process (things to be deleted/3) is de�ned in terms of the top-leveldata retrieval primitives (i.e. getentity/2 rather than internal getentity/4) so thatthe propagation occurs in the context of the updated database.Once the list has been built, it is divided into two sublists, one containing all theitems marked for deletion that exist in the add-module, and the other containing all theitems that exist in the underlying modules. The actual update is now carried out, bydeleting all the add-module elements, using the standard temporary deletion primitives,and by creating the remaining elements in the delete-module, using special versions ofthe temporary creation primitives that perform no propagations. These special internalprimitives also maintain the delete-module schema by copying the relevant descriptorswherever necessary. Another important modi�cation to the standard behaviour forthe creation of entity instances is that each delete-module instance is given the sameidenti�er as the instance whose de�nition it represents, rather than being allocated itsown unique identi�er.Modifying Instance HierarchiesThe include/3 primitive allows the class membership of a particular instance to beextended, while retaining its existing structure. Under ordinary circumstances, this is asimple operation which can be viewed as a restricted version of the newentity/3 prim-itive. Within a transaction, however, we have to be able to deal with partially deletedor extended hierarchies whose component instances are spread throughout the under-lying and the transaction modules. Figure 4.4, for example, illustrates three possiblerepresentations of the same instance hierarchy within a transaction.The (simpli�ed) de�nition of internal include/5 for transactions illustrateshow all these types of hierarchy can be extended by a single algorithm:% internal include(+ModuleType, +Desc, +Class, +FromInst, -Inst)internal include(transaction, Desc, ToClass, FromInst, ToInst) :-Desc = edesc(ToClass, , , , , , ),AddDesc = edesc(AddClass, , , , , , AddMod),functor(FromInst, FromClass, 1),transaction active(transaction, AddMod, DelMod),transaction name(AddMod, ToClass, AddClass),

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 111Disk Module Add-Module Delete-Modulestructure(1)helix(1) (a) Hierarchy de�ned by existing dataDisk Module Add-Module Delete-Modulestructure(1)helix(1)alpha(1) alpha(1)(b) Partially deleted hierarchyDisk Module Add-Module Delete-Modulestructure(1) add helix(1)(c) Partially extended hierarchyFigure 4.4: Three representations of an instance hierarchy

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 112% Make sure that all the necessary descriptors have been% copied, ready to receive the new object.create newentity descriptors(AddMod, Desc, AddDesc),% Find the "lowest" existing instance in the branch of the% hierarchy that includes ToClassderive key(FromClass, FromInst, HierarchyKey),lowest instance(ToClass, HierarchyKey, LowestClass, LowestInst),% Recreate any instances below this that have been deleted% during the transactionlowest deleted instance(ToClass, HierarchyKey, LowestDelClass,LowestDelInst),reincarnate deleted instances(DelMod, LowestClass, LowestInst,LowestDelClass, LowestDelInst),% Finally, create the remaining instances in the add-module(sub super class(ToClass, LowestDelClass) ->internal include(temporary, AddDesc, AddName,LowestDelInst, ToInst),; ToInst = LowestDelInst).The approach taken here is to locate the lowest \existing" instance in the hierar-chy by searching upwards from the destination class (ToClass), using the top-levelgetentity/2 primitive. We then check to see whether the hierarchy originally extendedfurther than this point by searching for any lower instances in the delete-module. Ifany such deleted instances exist then they are \reincarnated" (i.e. deleted from thedelete-module) and the lowest of them becomes the new lowest existing class. Finally,the hierarchy is extended from this point downwards to ToClass by creating the rele-vant instances in the add-module. Since this �nal part of the process involves instancecreation with the propagation semantics of creation, we can use a single call to thetemporary version of include/3 to create all the necessary intermediate instances.The database state shown in Figure 4.5 (a) illustrates this behaviour for theupdate:j ?- include(alpha, helix(12), Alpha).Initially, the lowest existing instance in this case is structure(12), since the lowerhelix instance has been deleted during the current transaction. Beneath this, we �nd

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 113Disk Module Add-Module Delete-Modulestructure(12)helix(12) helix(12)(a)Disk Module Add-Module Delete-Modulestructure(12)helix(12) alpha(12)(b)Figure 4.5: Instance inclusion example (a) before the update, and (b) after the update

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 114that helix(12) is the lowest deleted instance and it is therefore deleted from the delete-module. Finally, we extend the hierarchy to the destination class by creating an instancewith the identi�er add alpha(12) in the add-module. The results of this update areshown in Figure 4.5(b).In this example, each stage of the update identi�ed a new lowest instance. Ingeneral, however, we must allow either of the �nal two stages of the process to produceno new instances. For example, the above inclusion applied to the the databases shownin Figure 4.4 (a) and (c) would involve no deleted instances, and therefore the secondstage (i.e. the reincarnation of deleted instances) will not identify a new lowest instance.In the database shown in Figure 4.4 (b), on the other hand, no new instances need tobe created after the deleted alpha(12) instance has been reincarnated.Updating Function ValuesThe only primitive that remains to be considered is updatefnval/4. For optional func-tions (i.e. non-key functions) we can achieve the e�ects of the update by a combinationof the deletefnval/3 and addfnval/3 primitives as follows:internal deletefnval(Function, Arguments, OldResult),internal addfnval(Function, Arguments, NewResult).In general, this sequence cannot be considered to be equivalent to the correspond-ing function update, because the timing of the constraint checking is di�erent. Withupdatefnval/4, all constraints, whether on the deletion of the OldResult or the ad-dition of the NewResult, are checked before any updates are made. However, with thedeletefnval/3, addfnval/3 sequence the constraints on the function value additionare not checked until after the �rst update has taken place, by which time the functiondeletion has already successfully completed. Within a transaction, however, constraintchecking is always delayed until commit-time, and this sequence of updates can safelybe used to implement the transaction version of the updatefnval/4 primitive.In addition to the propagation to key-dependent instances, updates to key func-tions also involve an added propagation to the key index structures. As we saw inChapter 2, the exact format of these indexes depends on the storage schema in whichthey are implemented, but within the transaction primitives we are not allowed to make

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 115Disk Module Add-Module Delete-Modulestructure(8)structure name(structure(8)) = 'h2'key(structure(8)) =[p1cn1, 'A', h2] structure name(structure(8)) = 'h20'key(structure(8)) =[p1cn1, 'A', h20] structure name(structure(8) = 'h2'Figure 4.6: Contents of the transaction modules after key function updateany assumptions about the types of the underlying modules being updated. The onlymodules in which we are allowed to make any updates are the special transaction mod-ules. Rather than updating the true index structures, then, we add a new key-indexterm to the add-module to represent the updated key. Any subsequent updates can bemade directly to the new structure, so that there is only ever one key-index term presentin the add-module for each instance. For example, Figure 4.6 shows the contents of thetransaction modules after the update:j ?- updatefnval(structure name, [structure(12)], 'h2', 'h20').Of course, it is necessary for the other primitives to be aware of the existence of suchkey index terms and to ensure that no ambiguities arise. For example, if an instancewith an updated key is deleted, then its new key-index term in the add-module mustalso be deleted. However, the two retrieval primitives that depend upon key informa-tion (derive key/3 and getentity/3) can both operate unmodi�ed with this form ofupdated key.4.3 The Commitment ProcessThe commitment process for transactions under P/FDM consists of two phases: con-straint checking, and the copying of the changes made during the transaction to theunderlying module. We will now describe these two phases in more detail.

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 1164.3.1 Checking Constraints Under TransactionsThe simpli�cation techniques described in Chapter 3, which generate e�cient checks fordetecting the violation of constraints by individual updates, cannot be naively applied toconstraint checking under transactions. These techniques depend upon the assumptionthat the database state is consistent immediately prior to the update that is to bechecked, since this allows attention to be focussed on the immediate consequences ofthis single update. Within a transaction, however, this assumption is no longer valid andupdates cannot be checked in isolation [9]. For example, suppose we have a databasecontaining eight philosophers, and an integrity constraint that there must be no moreand no fewer than eight. After a transaction which adds a new philosopher and deletesone of the existing ones, the constraint remains satis�ed, but a check on either of theindividual updates would �nd that it had been violated and would cause the commitmentto fail. In other words, checking individual updates at commit-time can detect the factthat a transaction has, at some point, passed through an inconsistent state, and istherefore almost as unforgiving as the ordinary run-time constraint checker.To check the constraints on transaction commit, then, we must consider the ef-fects of the transaction as a whole rather than as a sequence of individual state changes.However, we can still make use of the fact that our constraints are range-restricted toreduce the number of constraints that must be checked. At commit-time, the schemasof the add- and delete-modules give us a concise description of the classes and functionsthat have been a�ected by the transaction, and we can use this information, in con-junction with the constraint indexes to locate the constraints that need to be checkedfor a particular transaction. For example, if AddClasses is the list of classes in theadd-module schema, the following fragment of code retrieves a list of the identi�ers ofall the constraints that could be violated by newentity or include events on thoseclasses:findall(ConstraintId, (member(AddClass, AddClasses),class to constraint(AddClass, ConstraintId),constraint to code(ConstraintId, [AddClass], EventType, , ),member(EventType, [newentity, include])), AddClassConstraintsList),remove duplicates(AddClassConstraintsList, AddClassConstraints).

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 117The same approach can be used to build a list of all constraints that might be violatedby addfnval, deletefnval and deletentity events, and thus gives us a good �rstapproximation of the constraints that have to be checked.In the current implementation of the P/FDM transaction mechanism, we checkeach constraint by executing the initialisation code described in Section 3.3.2. Unlikethe ordinary, single-update constraint checking process, which halts once it �nds oneconstraint that has been violated, the transaction constraint checker tries to identify allthe constraints that have been violated before returning control to the user. The aim isto allow the user to resatisfy all the violated constraints before attempting to committhe transaction once more.While this reuse of the initialisation code is a simple and thorough way to checkthe consistency of a transaction's e�ects, it is very ine�cient, since it rechecks theconstraints for parts of the database that are not a�ected by the transaction at all.We feel that our architecture o�ers scope for a much more e�cient form of constraintchecking, and in Section 6.2.2 we describe a mechanism by which most of the redundantchecking that takes place in the current implementation might be avoided.4.3.2 Committing the ChangesHaving checked that the updates described by the contents of the two transaction mod-ules represent a consistent change to the database, they can be copied to the underlyingmodules and thus made permanent. This process consists of three stages:1. Copy the contents of the transaction modules into three lists (called commit lists),one duplicating the contents of the add-module, the second duplicating the con-tents of the delete-module and the third containing the details of the key functionsthat have been updated.2. Close down the two transaction modules, using the standard close module/1primitive and remove the transaction active/3 metadata marker. (This stageis equivalent to the behaviour of the abort transaction/0 primitive describedabove.)

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 1183. Use the lists generated in stage (1) to perform the updates in the disk-basedmodules, thus committing the transaction. In order to avoid checking the integrityconstraints again when making these �nal updates, we call the internal primitivesdirectly, after �rst retrieving the module type from the metadata, thus bypassingthe constraint checking mechanism.The ordering of these steps is important. If we tried to copy the instances to the disk-based module while the transaction modules were still open, we would encounter twoproblems. Firstly, we would be attempting to create instances with the same keys asexisting ones (i.e. as the one we are copying from the add-module) and this is not allowedby the DBMS. Secondly, the transaction versions of the update primitives would still beactive and the update would be diverted to the special temporary modules once more.Hence the importance of closing the transaction modules before making the permanentcopy of the solution.The commit lists contain the following kinds of term:entity(Class, Key, InstId)function(FName, Arguments, Result)key update(InstId, [key value(KeyFn, Value), : : :])For entity instances, we store both the key and the instance identi�er, even though onlyone of these will ever be required. For instance, if the commit-list generated from theadd-module contains the termentity(alpha, [oval, ' ', h2], add alpha(104))then the corresponding update will make use of only the key information:internal newentity(hash, EDesc, alpha, [oval, ' ', h2], Alpha).The equivalent term in the delete-module's commit list, however, uses only the identi�er(which we know refers to an existing instance in the disk-based module):internal deletentity(hash, EDesc, alpha, alpha(104)).In fact, explicit keys for entity instances turn out to be very useful in the contextof transaction commits since they provide us with an identi�er for instances which isnot speci�c to the module in which the instance was created. The system-generatedidenti�ers do not have this property. For example, consider the function

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 119residue structure(residue) -> structureand suppose that the current add-module contains the mapping:residue structure(add module1 residue(1078) ->add module1 structure(479)The committed versions of these newly-created instances will have di�erent identi�ers,e.g. residue(1034) and structure(439), and therefore the mapping will relate theincorrect instances if copied naively. The keys of these instances, however, do not changefrom module to module, and it is equally valid to say that the residue structurefunction maps the residue with key [oval, ' ', 44] to the structure with key [oval,' ', h2] in both the transaction modules and the underlying modules. Therefore,any reference to an instance within the function/3 or key update/2 terms is actuallystored as an entity/3 term complete with key and identi�er, e.g.function(residue structure,[entity(residue, [oval, ' ', 44], add module1 residue(1078))],entity(structure, [oval, ' ', h2], add module1 structure(479)))During the �nal stages of the commit process, we can avoid a great deal of duplicatede�ort if we bear in mind the propagations that will be made by each particular update.For example, entity creations are propagated to their superclasses, so a single requestto create the lowest instance in each hierarchy will automatically result in the creationof all higher-level instances. The easiest way to avoid making duplicated updates is toprune the commit lists to remove any term representing a propagated update. The rulesfor inclusion of each type of update within the commit-lists, then, are:Instance Creation: include only the lowest existing instance in each branch of thehierarchy.Instance Deletion: include only the highest existing instance in each hierarchy, sinceinstance deletion is propagated to sub-instances. Also, instances that are key-dependent on some other instance in the delete-module do not need to be includedin the commit list.Function Value Addition: include only forward, non-key functions. Inverse func-tions will be populated automatically as a result of the population of the forwardfunction, and key functions are populated as a result of instance creation.

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 120Function Value Deletion: include only forward functions which are not de�ned onany instance that exists within the delete-module (i.e. any instance that is in-cluded in the delete-module commit list). All other functions will be deleted bypropagation from deletentity/2.Key Function Update: include only those keys which belong to non-add-module in-stances, since all newly created instances will automatically be given the updatedkey. In addition to this, we need only store the key update/2 terms for instanceswhich are not key-dependent on some other instance whose key has been updated.These rules should allow the creation of commit lists which represent the minimal setof update operations that are required to duplicate the e�ects of the transaction. Noticethat this is di�erent from the concept of minimality used earlier when referring to thetransaction modules, which contain a minimal set of updates in terms of the basic datamodel elements.Having built the commit lists, the transaction modules can be closed down toclear the way for the actual commitment of the updates. Updates are committed byrecursing along the commit lists, and performing the appropriate update action for eachupdate term. It is vital, then, to ensure that these lists are ordered so that none of thestructural constraints are violated by the updates. For example, the old instances mustbe deleted before the new instances are created, in case any of the new instances reusekeys from the original database. We must also ensure that new instances exist beforewe try to populate their functions, and that key-dependent instances are created afterthe instances that they depend on. Given that, by this stage, the commit lists havebeen split into the following �ve lists:AddInstances which describes the instances to be added to the database,AddFunctions which describes the function mappings to be added to thedatabase,DelInstances which describe the instances to be deleted from the database,DelFunctions which describes the function mappings to be deleted fromthe database, andKeyUpdates which describe the key function updates that are required

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 121a suitable ordering of the updates can be found by the following algorithm:1. Order the AddInstances list so that no instance appears before any instances onwhich it is key-dependent (giving OrderedAddInstances).2. Interleave the OrderedAddInstances and the KeyUpdates so that no new instanceis created before the key of any instance on which it is key-dependent has beenupdated3, and that no key is updated before any instances which are involved inthe new key have been created (InstancesAndUpdates).3. The �nal ordering is[DelInstances, DelFunctions, InstancesAndUpdates, AddFunctions]Once this �nal version of the commit list is generated, we can use it to make theappropriate updates, i.e. the termentity(Class, Key, Inst)is interpreted as:internal newentity(ModuleType, EDesc, Class, Key, ).for additions, and as:internal deletentity(ModuleType, EDesc, Class, Inst).for deletions (similarly for the function-based updates).4.4 Related WorkRecent work on ensuring the consistency of transactions has focussed, for the mostpart, on the use of various logical analysis techniques. Three general approaches can bedistinguished, each with a slightly di�erent intention:3Necessarily, this implies that the new instance depends on some instance in the underlying module.

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 1221. Automatic Transaction Synthesis [72, 93] in which procedural transaction descrip-tions are generated from high-level speci�cations, and the integrity constraintmetadata. The aim of this approach is to relieve the transaction designer of theburden of ensuring that their transactions will not violate any constraints, byallowing the language compiler to insert all the necessary checks itself.2. Automatic Transaction Veri�cation [97, 108] in which the transaction speci�cationprovided by the user is analysed for potential integrity violations. The transactionanalyser reports back to the designer of the transaction, describing potential prob-lems and making suggestions as to the checks that could be inserted. The aim ofthis approach is to guide the transaction designer in the production of consistenttransactions.3. Constraint Simpli�cation for Multiple Updates [83, 54] which attempts to producesimpli�ed versions of constraints for checking at the end of a particular transaction,in order to improve the e�ciency of this process.Unfortunately, all these approaches assume that transactions are de�ned in advance, insome �xed form that can be analysed, which is not the case with transactions in P/FDM.Moreover, the �rst two approaches, which involve embedding constraint checks intoprocedural descriptions of transactions, su�er from the same di�culties as the method-altering approach to run-time integrity constraint checking in that it becomes verydi�cult to disable or delete or even modify individual constraints once they have beenembedded in transaction code in this way.One solution to this problem was proposed by Je�rey, Lay and Curtis [57]. Intheir system, which is based on a relational database, integrity constraints are repre-sented as state transition networks which are stored in a special metadata table, calledpred. At commit-time, constraints are retrieved from this table and are evaluated (us-ing Prolog). If a constraint is found to be violated (i.e. it evaluates to false) then theDBMS uses the information stored in the pred table again to work out why the updateis invalid, and to give a helpful error message to the user. This approach is similarto the architecture for single-update integrity constraint checking used by both CoLanand P/FDM, in that a single representation of each integrity constraint is stored in themetadata, which is then retrieved at run-time as necessary. However, in this system,

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 123only the declarative form of the constraint is stored (rather than both a declarativeand a procedural representation) and the validation process therefore incurs the extraoverhead of having to re-interpret it for each update that is to be checked.The constraint simpli�cation technique is the most exible of the three ap-proaches in that it can sometimes require only very basic information about the trans-action in order to be able to generate the simpli�ed constraint forms. The method ofHsu and Imielinski [54], for example, requires only the knowledge of whether a trans-action adds new data, deletes data or involves a mixture of these two update types inorder to be able to generate simpli�ed constraint forms. Since at most three simpli�edforms can be generated from any one constraint, it would be possible to generate allthree at compile-time, and store them in the metadata ready for checking at the end ofour arbitrary transactions. Unfortunately, this technique is based upon the relationaldata model, in which only two update operations (add-tuple and delete-tuple) are re-quired, and in which almost no structural constraints are enforced. It is not, therefore,suitable for use with more complex data models like the FDM. In addition to this, thelimited amount of information about the transaction that is taken into account duringsimpli�cation means that only a limited gain in e�ciency can be achieved, compared tosimpli�cation after a full analysis of the transaction.One approach which does not require any analysis of a transaction in order tocheck its consistency is the rule-based approach taken by systems such as Context [106],Starburst [21] and ODE [45]. In Chapter 3, we saw how these systems use simpli�cationtechniques to compile declarative constraint speci�cations into triggers that will checkthe maintenance of the constraint by individual updates. These systems reuse thetriggers generated in this way for checking the consistency of transactions by delayingthe �ring of rules until commit-time. In both Context and Starburst, this behaviour isprovided by disabling rule �ring within transactions, and then examining the transactionlog on commit, to determine which rules must be evaluated. In fact, the Starburstsystem maintains a consolidated log and therefore, like our own system, checks only theconstraints that could be violated by the net e�ects of the transaction. The ODE systemworks in a similar way, except that the user may specify whether the constraint is tobe checked immediately on update (called a hard constraint) or whether it should bedeferred until the end of the current transaction (a soft constraint). Hard constraints are

CHAPTER 4. CONSTRAINTS AND TRANSACTIONS IN P/FDM 124useful for implementing constraints that it is never sensible to violate, such as structuralconstraints and some domain constraints (e.g. no negative age values).The advantages of this rule-based approach are that it is able to check the consis-tency of arbitrarily complex transactions, without requiring complicated compile-timeanalysis, and that the constraints to be checked are selected at run-time, allowing themto be added, or deleted or modi�ed freely. The disadvantage is that the rules aregenerated by simpli�cation of the original constraint description for the validation ofparticular updates, and as we have seen this technique is not wholly suitable for checkingsets of updates. The Starburst system, which generates one rule per constraint, su�ersless in this respect but only at the expense of having to make more redundant checks.The fact that rules generated in this way are so tightly focussed on individual updatesalso hampers their ability to take e�ective actions in the event of integrity violation. Forexample, if a particular violation is caused by two updates, it may not be possible todetermine the updates that must be made to solve both problems by considering each ofthe violations separately in turn. It may be possible, however, to solve these problemsby taking a compromise approach between the code generation method and the rule-based method, as we did for constraint checking for individual updates in Chapter 3.One possible architecture, which could easily be incorporated into P/FDM, is describedin Section 6.2.

Chapter 5Non-Deterministic Updates inP/FDMThe previous two extensions to P/FDM provide \operational" support for updates inthat they help to ensure the consistency of the updated data. This �nal extension aimsto support the user in the \expression" of updates. While the Daplex language is wellsuited to the expression of complex ad hoc queries, it provides only the usual basicoperations for creating, deleting and updating data. Users are still required to work outmost of the details of the update, such as the ordering of the operations, for themselves.Very often, however, and especially in the context of the less-traditional application areasin scienti�c and design databases, the user has more information about the conditionsthat a set of updates should preserve or make true, than about the exact details of thechanges required. For example, when solving resource scheduling problems, one is moreconcerned with the fact that no resource should be double-booked than with the actualallocation of resources to time-slots. Or, to take an example from our own domain ofprotein structure, when building models of proteins one is generally more concernedthat certain important biochemical rules are not broken than with the exact positioningof each atom. In other words, a higher-level approach to the speci�cation of updates isrequired in which the user describes the result that the update must achieve, and theDBMS is responsible for working out how this e�ect may be achieved by a sequence of\low level" updates. 125

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 126This �nal extension to Prolog aims to allow the high-level description of thecreation of a set of objects that together and in conjunction with the existing data exhibitsome required property. It is easy to describe such updates from Prolog, since we can usethe built-in backtracking mechanism to search for the set of updates that will produce asolution. In order to make this facility accessible to non-Prolog programmers, we haveadded a new loop construct to the Daplex language for describing the non-deterministicassignment of attribute values to newly-created instances. Programs involving the newconstruct are compiled into a set of recursive Prolog programs that use a special kindof undo-able update, known as a backtrackable update, which simulates the e�ects ofProlog instantiation over database attributes. The e�ect is to allow the non-expert userto take advantage of Prolog's chronological backtracking facilities for solving a class ofconstrained search problems in a database environment.This chapter is organised as follows: Section 5.1 describes the extension to Daplexand illustrates its use in the solution of a simple scheduling problem. Section 5.2 de-scribes the underlying database support for the new language construct, including theimplementation of the backtrackable updates, and Section 5.3 gives an overview of theprocess of compiling the extended Daplex language into Prolog. Section 5.4 discussesthe use of semantic information in the form of integrity constraints to cut down thesearch space of the compiled code. Section 5.5 reviews similar language features inother systems and, �nally, Section 5.6 shows how the extended Daplex language can beused in the protein modelling application.5.1 Syntax and Semantics of the Daplex ExtensionAn early paper by Floyd [42] described the addition of a simple non-deterministic choiceconstruct to an otherwise deterministic language (Algol) as a means of simplifying theexpression of backtracking algorithms. Floyd also showed how ALGOL programs in-volving non-deterministic choice can be mechanically transformed into completely de-terministic algorithms, using a backtracking search technique to simulate the e�ects ofthe non-determinism. By analogy, our extended Daplex language contains a new loopconstruct which describes the creation of an object and the selection of values for itsattributes non-deterministically. The syntax of the new loop is:

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 127for a new <var> in <class> such that <predicate>which can be read as describing the creation of a new instance of the class <class>,with its identi�er bound to the variable <var>, according to the constraints given in<predicate>. The predicate consists of a conjunction of Boolean-valued expressions,such as equality comparisons and tests for set-membership. Constructs for specifyingthese kinds of expressions already exist in Daplex, so the only new piece of syntaxrequired is that shown above for the loop itself.While we may reuse the syntax of the original Daplex language, the interpretationof these expressions within a \for a new" loop is radically di�erent from the standard de-terministic interpretation. Constraints which describe an equality relationship betweenan attribute of the newly created instance and some literal value, for example, cannotbe interpreted in the usual way (i.e. as a straight comparison) because the attribute hasnot yet been given a value. Instead, we borrow the idea of instantiation from logic pro-gramming, whereby an unde�ned variable takes on the value of whatever it is comparedwith. In other words, we interpret such comparisons as an update to the unde�nedattribute. Consider, for example, the following fragment of a Daplex program:for a new p in person such that age(p) = 30which requests that a new instance of the class person be created and the age attributebe given the value 30. The e�ect of this kind of comparison is to initialise part of thenew object's state. Attribute values may also be speci�ed in terms of other databaseobjects, so that:for a new s in soloist such thataccompanist(s) = the p in pianist such thatno p1 in pianist has ability(p1) > ability(p)allocates the best pianist to accompany the new soloist. It is even possible to de�neattribute's value in terms of other attributes of the newly created instance, as with theage attribute in the following example:for a new b in british school child such thatpenfriend(b) = any(french school child) andage(b) = age(penfriend(b))

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 128Of course, this sort of thing is only possible where the attributes depended on canbe evaluated to some concrete value. It is not possible, for example, to have circularde�nitions where two or more attributes are dependent on each other's value.This example also illustrates the use of the aggregate function any/1, which non-deterministically returns a single element of its argument set that also satis�es the laterconstraints. The same e�ect can be achieved by a set membership test, for example:for a new t in teenager such that age(t) in f13 to 19gThe e�ect of this is to specify a reduced domain for the attribute value. As with theany/1 function, the choice of exactly which value the attribute will take depends onwhich are able to satisfy the remaining constraints. This ability to specify a �nite setof possible values for an attribute is a crucial element of the non-deterministic problemdescription as it is the way in which the user speci�es the attribute values that may bevaried in the attempt to satisfy the constraints.One set of attributes which require special treatment are the key attributes. Itis the program writer's responsibility to ensure that the key attributes are all fullypopulated within the predicate speci�cation, as without these values the new instancecannot be created (this is a condition imposed by the underlying data model, as we sawin Chapter 2). However, the assignments need not appear in key order and the values ofindividual key functions may be de�ned in terms of other (key or non-key) attributes.This means that rather curious looking programs are quite legal. For example, thefollowing Daplex fragment:for a new ml in music lesson such thatday(ml) in freedays(pupil(ml)) and : : :requests that the day of the new lesson, which is a part of its key, must be one of thoseon which the pupil taking the lesson is also free. We are, apparently, assuming that thenew lesson instance exists (as we retrieve the value of its pupil attribute) while we arespecifying the key values which will be used to create it.In general, then, the <predicate> will consist of:1. a number of equality comparisons with unde�ned attributes, which per-form the initialisation of attributes;

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 129personstudent musicteachermusiclessonhobby pupil teacherdone by namefreedaysinstrumentsdayinstrumenttypeday

Figure 5.1: Diagrammatic schema for the Music Lesson Database2. a number of set membership tests involving unde�ned attributes, whichdescribe the problem space; and3. a number of straight tests, which represent the constraints that thenewly-created object must satisfy.The ordering of the tests at this stage is unimportant as the compiler does a certainamount of rearrangement, to make sure, for example, that variables are not used beforethey have been assigned to.The following simple scheduling problem illustrates how the new loop constructcan be used to describe a class of constrained search problems, and how the underlyingsearch process behaves. The schema in Figure 5.1 describes a database storing detailsof the music lessons o�ered within a particular town. Each student has one music lessonper week, with a teacher who is quali�ed to teach his or her particular instrument. Tosimplify matters, we assume that each student is learning only one instrument, and thateach teacher gives only one lesson per weekday and none at weekends. This means thatlessons can be uniquely identi�ed by the combination of their teacher and the day onwhich they occur, and we can use these attributes as the key of the music lesson class.In addition to taking music lessons, people may have other hobbies which keepthem busy on particular evenings of the week. The class hobby records this information.

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 130Name Instrument(s) Free DaysMon Tue Wed Thu FriMrs. Mozart Piano, Harpsichord, Voice Busy BusyMr. Vivaldi Harpsichord, Clavichord Busy BusyJ.S.B. Harpsichord Busy BusyC.P.E.B. Harpsichord Busy Busy BusyFigure 5.2: Initial state of the Music Lesson DatabaseA method function, called freedays/1 is de�ned on the class person, which computesthe set of days on which a person is free to take up another commitment from the setof days on which that they are busy with hobbies and music lessons.Figure 5.2 shows the current state of the database. We have two teachers |Mr. Vivaldi and Mrs. Mozart | and two students | J.S.B. and C.P.E.B. The tablegives the instruments that each person plays or teaches, and the days on which he orshe is available for a music lesson. Our task is to write a program that allocates a musiclesson (i.e. a teacher and a day) to each student. The student should be taught bysomeone who is quali�ed to teach their instrument, and the day should be chosen sothat neither the pupil nor the teacher is busy with some other commitment.We can use the new loop construct to solve this problem as follows:program allocate isfor each s in student such that no l in music lesson has pupil(l) = sfor a new ml in music lesson such thatteacher(ml) = any(t in music teachersuch that instrument(s) in instruments(t)) andpupil(ml) = s andday(ml) in freedays(teacher(ml)) andday(ml) in freedays(pupil(ml))print("Student", name(s), "allocated to", name(teacher(ml)), day(ml));The �rst loop generates the set of students requiring music lessons, while the seconddescribes the creation of a new music lesson instance for each of these students, andthe constraints on its attribute values. The print action displays the details of thesolution that has been found. Daplex programs are evaluated in two phases, with thesets of values described by the \for loops" being generated completely before the actionsare executed. This is particularly important for programs involving non-deterministicloops as it means that the actions are executed only for those objects which have been

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 131created as part of the solution to the problem, and not for any objects belonging tointermediate, possibly incorrect solutions. For example, if program allocate were to beexecuted against the database shown in Figure 5.2, the DBMS would �rst try a solutionin which J.S.B. was allocated a lesson with Mrs. Mozart on Tuesdays before discoveringthat it is not then possible to allocate a lesson to C.P.E.B., for whom Tuesdays withMrs. Mozart is the only possible allocation. On its next attempt, the DBMS allocatesanother slot to J.S.B. and thus frees up the important Tuesday/Mrs. Mozart slot forallocation to C.P.E.B. All this is done during the evaluation of the loops part of theprogram which ends with only the two music lesson instances which belong to the fullsolution having been created. The print action is then executed to display the detailsof these instances:Student J.S.B. allocated to Mrs. Mozart, thursdayStudent C.P.E.B. allocated to Mrs. Mozart, tuesdayAlthough each \for a new" loop describes the creation of only a single newinstance, we can create sets of new instances by nesting the \for a new" loop insideanother loop, as we have seen in allocate. Similarly, \for a new" loops themselvesmay contain nested loops, so that complex constraints involving several other databaseclasses may be expressed. It is even possible to nest a \for a new" loop inside another\for a new" loop, allowing us to make the creation of one instance conditional uponthe successful creation of another set of instances.5.2 Database Support for the Daplex ExtensionFrom Floyd's work with ALGOL [42] we can extract the following general requirementsfor simulating non-deterministic choice using backtracking:(i) some notion of the success or failure of statements(ii) the ability to move backwards through the control ow on failure, as well asforwards on success(iii) the ability to generate alternative values for variables on backtracking, and(iv) the ability to undo state changes (e.g. variable assignments, reads and writes) onbacktracking

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 132If the target language does not have these features then they must be implementedby the translation process. Floyd, for example, implemented property (iii) by storingvariable values on a stack. Our target language, Prolog, already supports the �rst threeproperties so in our case code generation is comparatively straightforward. Prolog alsohas the ability to undo choices of values for variables, by uninstantiating them, butwe cannot make use of this feature when generating Prolog from Daplex, since Daplexprograms operate not on in-memory variables, but on database objects. When Daplexis the source language, then, property (iv) becomes:(iv) the ability to undo database state changes on backtracking.In other words, we need to have some means of updating the database that is analogousto instantiation, i.e. updates that automatically undo themselves on backtracking. Wecall such an update a backtrackable update. In the previous chapter, we saw how we wereable to rede�ne the behaviour of the update primitives within an active transaction byproviding new internal de�nitions for the special module type transaction. We canuse the same mechanism to implement backtrackable versions of the update primitivesby de�ning a new module type called backtrackable. Within this module type, theinternal de�nitions of the update primitives implement the required semantics. Forexample, here is the (simpli�ed) internal de�nition of the newentity/3 primitive forthe backtrackable module:% internal newentity(+ModuleType, +Metadata, +Class, +Key, -InstId)internal newentity(backtrackable, EDesc, Class, Key, InstId) :-find true module type(Class, ModType),internal newentity(ModType, EDesc, Class, Key, InstId), % (1)( true; internal deletentity(ModType, EDesc, Class, InstId),% (2)fail % (3)).On �rst entering the backtrackable internal primitive, the ordinary non-backtrackableversion of newentity/3 is invoked to perform the actual update (line 1). If this suc-ceeds we allow the entire primitive to succeed but we create a choice point so that,should we ever backtrack into this code, the update will be undone (by the call to

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 133internal deletentity/4 on line 2). Having removed the invalid instance, we fail,forcing the backtracking to continue (line 3).In fact, we only need to make two of the update primitives backtrackable inthis way: newentity/3 for creating new instances, and addfnval/3 for adding newfunction values. By the very nature of this class of queries, we are interested only inadding to the partially completed structure, and not deleting from it. This becomesmore obvious when we consider the equivalent Prolog solution: one progresses throughthe search by instantiating variables, not by uninstantiating them. As it happens, thisis quite convenient as, in general, it is much easier to undo the creation of an objectthan to undo its deletion. Instance creation is propagated only to super-instances andcan be undone by a single call to the deletentity/2 primitive, but instance deletion ispropagated to all key-dependent instances and relationship functions and could requiremany calls to newentity/3 and addfnval/3 to undo completely.Having de�ned the relevant internal de�nitions, we can make any class or functionsusceptible to backtrackable updates by rede�ning its module of de�nition to be thespecial backtrackable module. We must be careful, however, to keep a note of theoriginal module of de�nition so that it may be restored when the backtracking searchis completed. In fact, we can achieve this e�ect simply by \covering" the existingdescriptor for the class or function in question with a descriptor giving the moduleof de�nition as the backtrackable module. In P/FDM, where metadata is stored asProlog clauses, we can take advantage of the �xed ordering of these clauses and cancover the original descriptor by placing the modi�ed descriptor before it in the clausebase. For example, the following arrangement of entity descriptors:edesc(music lesson, entity, string, [foreign(teacher), day], 0, 4,backtrackable).edesc(person, entity, string, [name], 12, 14, musicdb).edesc(student, person, string, [name], 6, 14, musicdb).edesc(music teacher, person, string, [name], 4, 14, musicdb).edesc(music lesson, entity, string, [foreign(teacher), day], 4, 4,musicdb).will cause updates to the music lesson class to be backtrackable, whereas updates tothe remaining classes, such as person, will be made using the ordinary update primi-tives. Once we have �nished our non-deterministic updates on music lesson, we can

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 134a f d bx yView of Class

DatabaseClause baseMemoryDiskFigure 5.3: Using a temporary module to store backtrackable updatessimply remove the backtrackable descriptor and it will resume its status as an ordinaryclass once more.Unfortunately, none of the update primitives are particularly fast operations(the emphasis being on security rather than speed, as is proper for a database update).For deterministic Daplex programs this is not a signi�cant problem, since retrievalsform the bulk of the interaction with the database, but for the backtracking algorithmthat implements a non-deterministic Daplex program, in which updates can occur asfrequently as retrievals, the speed of the update primitives becomes a major factoron the e�ciency with which this kind of problem can be solved. We can make ourbacktrackable updates considerably faster, however, if we make use of the \temporary"module storage type, as we did for the transaction primitives.We can use a temporary module to store instances created by the backtrackableversion of newentity/3, in an extension of the disk-based class (see Figure 5.3). Duringthe execution of a non-deterministic Daplex program, all updates (i.e. instance creationsand function value additions) occur within the temporary extension to the class:

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 135internal newentity(backtrackable, EDesc, Class, Key, InstId) :-internal newentity(temporary, EDesc, Class, Key, InstId),( true; internal deletentity(temporary, EDesc, Class, InstId),fail).(Similarly for the addfnval/3 primitive.) When data is retrieved (by getentity/2,3and getfnval/3) both the temporary extension and the disk-based module are searched,so that the class as seen by the user is the union of the newly created, temporaryinstances and those existing on disk. To implement this behaviour, we must providenew internal de�nitions for the retrieval primitives, as shown here for getentity/2:% internal getentity(+ModuleType, +Metadata, +Class, -InstId)internal getentity(backtrackable, EDesc, Class, InstId) :-( % Retrieve instances from the temporary extensioninternal getentity(temporary, EDesc, Class, InstId); % Uncover the original class descriptorhide backtrackable descriptors(EName, BDescs),( % Now search the real modulegetentity(EName, Inst),% And put the descriptors back whether we have% succeeded : : :reveal backtrackable descriptors(BDescs); % : : : or notreveal backtrackable descriptors(BDescs),fail)).Instances are �rst retrieved from the backtrackablemodule, treating it as an ordinarytemporary module, and are then fetched from the disk-based module, by temporarilyremoving the backtrackable descriptors that are concealing the true descriptors andallowing getentity/2 to bind to the appropriate internal de�nition. Notice that thebacktrackable descriptors must always be replaced (at the top of the clause base)before we exit the primitive, regardless of the results of the retrieval.

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 136Of course, by using a temporary module in this way we have to sacri�ce somespeed of retrieval in return for greater speed of update (rather an unusual compromisefor a DBMS) but we should remember that these modi�ed retrieval primitives are usedonly on those classes and functions which are directly involved in backtrackable updates.Retrieval from other classes, such as student and music teacher in our example, willbe entirely una�ected by this change.Once all the \loops" of a Daplex program have been successfully evaluated, thebacktrackable module will contain a set of instances which satis�es the given con-straints and so represents a solution. These instances can then be copied to the disk-based module, so that they become ordinary persistent objects, indistinguishable fromthose created by conventional updates. Since a program involving backtrackable updatescan be seen as a transaction which contains only additions, we can reuse the standardtransaction commit procedure (minus the constraint checking, of course) to implementthis process.5.3 Compilation of the Daplex ExtensionAs we saw in Chapter 3, the compilation of a Daplex program proceeds by a series oftransformations from the initial list of lexical tokens to a Prolog predicate. To recap,the individual transformation stages are:(i) The parser translates the list of lexical tokens into a parse tree. Syntactic checkingand some semantic checking is done at this stage.(ii) The parser translates the \loop" parts of the parse tree into a list of intermediatecode constructs.(iii) The ICode is passed to the optimiser which translates it into an equivalent butmore e�cient ICode description of the program.(iv) The code generator translates the reformulated ICode into a fragment of Prologcode.(v) Control now returns to the parser which translates the remaining \action" partsof the parse tree into Prolog.

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 137(vi) Finally, all the various fragments of generated Prolog are glued together to createa complete Prolog predicate.This process produces a single Prolog predicate which implements the iteration de-scribed by the Daplex program using failure-driven loops. The �rst part of the iterationgenerates sets of values for the loop variables speci�ed by the program. The secondpart iterates over these sets, using each combination of values as the arguments to theaction calls. The (simpli�ed) general form is:program name :-( loop code,save loop variable values,fail; get loop variable values,action code,fail; true).The two failure-driven loops here clearly illustrate the two-phase nature of the evaluationof Daplex programs, with the set of loop variable bindings being generated by the �rstloop and the sequence of actions being executed by the second. The failure-drivencontrol strategy means that code generation is relatively straightforward. Both nestedloops and sequences of actions are handled as simple conjunctions, as the followingDaplex query and its Prolog translation indicate:% Program to display the days on which each teacher is busy% and to populate the busy days/1 functionprogram busy teachers isfor each t in music teacher % Loop 1for each ml in teacher inv(t) % Loop 2print(name(t), ``busy on'', day(ml)), % Action 1include day(ml) in busy days(t); % Action 2

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 138busy teachers :-( getentity(music teacher, Teacher), % Loop 1getfnval(teacher inv, [Teacher], Lesson), % Loop 2remember loop var bindings([Teacher, Lesson]),fail; retrieve loop var bindings([Teacher, Lesson]),getfnval(name, [Teacher], TName),getfnval(day, [Lesson], Day),writeln([TName, 'busy on', Day]), % Action 1addfnval(busy days, [Teacher], Day), % Action 2fail; true).Unfortunately, while the code generated from a non-deterministic Daplex program re-tains the basic two-phase structure, we cannot adopt this simple failure-driven approachto the translation of non-deterministic program loops. One reason for this is the over-loading on Prolog failure. Our backtrackable updates use failure as the signal to undothemselves, since this generally means that some constraint could not be satis�ed or thatsome generator has run out of values to generate. But within the context of a failure-driven loop, failure also means \�nd the next solution". How can we decide whetherto undo the choices we have made or not, if both \the constraints cannot be satis�ed"and \the constraints have been satis�ed, now solve the next part of the problem" aresignalled in the same way?The second reason that we cannot use a failure-driven control strategy is thatit limits the extent to which we are able to backtrack within the search space to anunworkable degree. When we evaluate a normal, deterministic Daplex program, we aresimply enumerating results, and therefore are making an exhaustive, forward traversalof the search space. When \for a new" loops are involved, however, we do not, ingeneral, require an exhaustive exploration of the search space and we have introducedthe possibility of backtracking. In this case, we need to remember the previous bindingsof the loop variables so that we can backtrack to whatever level necessary in pursuit ofa satisfactory solution. Our approach, therefore, is to replace the failure-driven controlstrategy with a recursive strategy which keeps track of all the choice points for all theloop variables and thus allows the deep backtracking required to �nd solutions. Under

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 139this translation scheme, each loop in the Daplex program is translated into a recursiveroutine that iterates over the bindings for the previous loop variable, generating a listof possible bindings for its own loop variable and passing this on to the next routine. Infact, we translate a Daplex query involving N loops into N recursive Prolog predicates,each of which invokes the predicate implementing the following loop in the query. The(simpli�ed) general form of a query involving n \for a new" loops, then, is:program name :-initialise backtrackable module,generate values for first loop variable(L),program name 2(L),commit backtrackable updates,( get loop variable values,action part,fail; true).program name m([], OtherVars).program name m([H | T], OtherVars) :-generate values for mth loop variable(L, H, OtherVars),program name m+1(L, H, OtherVars),program name m(T, OtherVars).program name n([], OtherVars).program name n([H | T], OtherVars) :-generate values for nth loop variable(L, H, OtherVars),( save loop variable values; forget loop variable values),program name n(T, OtherVars).The structure of the �rst clause (program name) resembles the deterministic translation,in that it is basically a failure-driven loop. The crucial di�erence, however, is that thefailure-driven control is used only to drive the \actions" part of the program, and notthe \loops" part as previously. Another signi�cant di�erence is that we collect up all

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 140the values for the loop variable (i.e. all the members of the set described by the �rstloop) into a list (L), rather than enumerating them by backtracking. It is this list,which e�ectively acts as a record of all the bindings of the loop variable, that allows usto backtrack to an arbitrary degree when searching for a solution.Once this initial list has been generated, the recursive search process can begin.Each inner loop is implemented by a predicate (program name m) which recurses downthe list of loop variable values generated by the previous loop. For each variable binding(H) we generate the set of bindings for the current (i.e. m-th) loop variable (L), and passthis on to the routine implementing the next loop in the program (program name m+1).At each stage, the state of the search is represented by the \currently" active loopvariable bindings, which are passed as parameters to each routine (OtherVars).The search process proceeds in this way, through successively deeper and deeperlevels of recursion, until a complete set of loop variable bindings has been generated.This occurs within the predicate implementing the innermost loop (program name n)so at this stage we record the current set of bindings as representing a partial solu-tion (save loop variable bindings). The disjunction here ensures that bindings are\forgotten" if they are later found not to be part of a complete solution.If the recursive search succeeds then satisfactory allocations have been found forall the loop variables. The objects which remain in the special backtrackable module atthis point represent the solution, and may be committed to the database. It is importantthat this is done before we execute the \actions" part of the program, as it may involvesome ordinary updates to the objects that were created by the \loops". By committingthe data as soon as we have a solution, we are removing the backtrackable module andwith it the special metadata markers that cause the invocation of the backtrackableprimitives, so that any subsequent updates will be made in the ordinary, non-undoableway. How then are the individual predicates that implement the loops generated? Asthe general form given above shows, the main task of each such predicate is to generatethe list of bindings for its respective loop variables. The details of this process vary witheach loop type. For \for each" loops, we generate all the possible loop variable valuesand collect them up into a list using findall/3. For example, within a deterministicDaplex program, the following loop:

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 141for each p in protein such that molecular weight(p) > 3000would be translated to:getentity(protein, P),getfnval(molecular weight, [P], MW),MW > 3000whereas within a non-deterministic program its translation would be:findall(P, (getentity(protein, P),getfnval(molecular weight, [P], MW),MW > 3000), L)Notice that the code which generates the individual values for p is the same in bothcases. In the non-deterministic translation, however, we must use findall/3 to insulatethis section of failure-driven code within the surrounding, recursively-driven routine.The \for the" and \for any" loops do not need such insulation since they bothgenerate only a single binding (V) for their loop variables. We can therefore use thesame translation for both deterministic and non-deterministic programs, with the listof loop variables in the latter case being formed by the simple instantiation L = [V].For example, the loop:for any p in protein such that molecular weight(p) > 3000is translated to1:findfirst((getentity(protein, V),getfnval(molecular weight, [V], MW),MW > 3000), 1),L = [V] % This line added for non-deterministic translation only1Here, findfirst(Goal, Num) is a built-in predicate for P/FDM which allows the given Goal tosucceed at most Num times.

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 142Unsurprisingly, the �nal loop type | the \for a new" loop | is rather morecomplicated to compile, since it must bind its loop variable to the identi�er of a newlycreated instance, with appropriately populated attribute values. The creation of thisinstance occurs in three stages:(i) candidate values are generated for each of the attributes of the new instance fromthe information given in the predicate part of the loop. Note that this will alsoinvolve the generation of key values for the instances (the parser checks that a fullkey is speci�ed by each \for a new" loop during syntactic analysis)(ii) using the backtrackable versions of the update primitives, a new instance is createdand its attributes are populated with the candidate values identi�ed in stage (i)(iii) �nally, the constraints on the attribute values (given in the loop predicate) areevaluated against the current state of the database for satisfaction.The compilation process, then, consists of converting the intermediate code representa-tion of a \for a new" loop into a recursive predicate of the form described above whichimplements this behaviour. We represent \for a new" loops internally as generationsubqueries:generate subquery(Class, InstIdVar, creation(Predicate))The second loop of the allocate program, for example, has the following internal form:generate subquery(music lesson, var(evar2), creation([expression(=, var(evar3), var(uevar1)),generate(music teacher, var(evar4)),: : :]))In order to convert this type of ICode construct to Prolog, we must �rst extract a gener-ator for each attribute from the set of constraints described by the loop predicate. Thesecond loop of the allocate program involves all three attributes of the music lessonclass: pupil/1, teacher/1 and day/1. The generators for the �rst two are easy toselect, since there is only one constraint on each of these attributes. However, there aretwo constraints involving the day/1 attribute. Thanks to the declarative nature of the

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 143speci�cation of the constraints, we are free to choose either of these as the generator.In the interests of e�ciency, it is obviously preferable to select the most \restrictive" ofthe constraints. For example, if the constraints on the age/1 attribute of a new personinstance are:age(p) in f17 to 65g andage(p) = age(partner(p))then the choice of generator can mean the di�erence between no backtracks and up toforty-nine backtracks! In our implementation, therefore, we adopt the simple strategyof choosing an equality comparison rather than a set-membership test as the generatorwherever possible. When no constraint is obviously to be preferred, as in the case of theday/1 attribute above, then the �rst that is encountered is selected. In future versions ofthe language compiler, we hope to be able to make use of the conventional optimisationtechniques already available in the Daplex query optimiser to make a more informedchoice in such cases.Having selected our generators, we combine them to form a conjunction of ICodestatements, which is then translated to Prolog using the existing code generator. Thisfragment of code implements the �rst stage of the creation of the new instance. Thefragment implementing the second stage consists of a call to the newentity/3 primitive,followed by one call to addfnval/3 for each of the generators identi�ed in the previousstage. Finally, we combine the remaining constraints into a separate conjunction andtranslate it to Prolog code to form the fragment implementing the third stage of theprocessing. The code generated from the \for a new" loop of the allocate programshows how these fragments are combined in the �nal routine:% Innermost routine implementing the nested "for a new" loopallocate1([]).allocate1([Student | RemainingStudents]) :-% Select a value for the "teacher" attributegetentity(music teacher, Teacher),getfnval(instrument, [Student], Instrument),getfnval(instruments, [Teacher], Instrument),% Select a value for the "day" attributegetfnval(freedays, [Teacher], TDay),

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 144% We do not need to select a value for the "pupil" attribute, as% this will be given by the input parameter list (i.e. Student).% Build the key of the new music lesson instance.derive key(music teacher, Teacher, TeachersKey),append(TeachersKey, [TDay], Key),% And use a backtrackable update to create it.newentity(music lesson, Key, Lesson),% Define the pupil attribute (also using a backtrackable update)addfnval(pupil, [Lesson], Student),% Now check the constraint that the pupil of the lesson must be% free on the chosen day.getfnval(day, [Lesson], Day),getfnval(pupil, [Lesson], Pupil),getfnval(freedays, [Pupil], Day),% Remember the bindings of the loop variable values (the% disjunction is to ensure that invalid bindings are% "forgotten" when we backtrack into this routine.( assert loop variable values([Student, Lesson]); retract(loop variable values([Student, Lesson])),fail),% Finally, continue allocating lessons for the remaining students.allocate1(RemainingStudents).This, of course, is only a part of the code that is generated from the allocate program.The full version, which also illustrates the compilation of a \for each" loop under thenon-deterministic semantics, is given in Appendix D.Since the failure-driven strategy is somewhat more e�cient than the approachdescribed above, we continue to generate failure-driven programs where the searchingrequired is deterministic, and we resort to the more complicated recursive approach onlywhen \for a new" loops are involved.

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 1455.4 Use of Integrity Constraints to Prune the SearchSpaceThe backtracking strategy used to solve the constraint problems described by the ex-tended Daplex language is an inherently ine�cient form of search for large and compli-cated search spaces. In our case, this problem is compounded by the fact that we cannotapply the conventional optimisation techniques implemented by the current Daplex op-timiser to non-deterministic Daplex programs. The current optimiser's principal optimi-sation technique is to reorder the loops in a Daplex program so that the e�ciency of thedata retrieval is increased. As such, it relies upon the referential transparency of dataretrieval within Daplex programs. The introduction of the \for a new" loop, however,breaks this assumption, and thus severely limits the possibilities for loop reorderingduring optimisation.Another possibility that we are exploring is the use of semantic domain infor-mation, in the form of integrity constraints, to add extra constraints to the problemdescription and thus to reduce the search space. Without this kind of optimisation, wewould not discover the fact that a partial solution violates one or more of the currentlyactive integrity constraints until we have tried to create the o�ending instance or pop-ulate one of its attributes. This is wasteful of e�ort and is, in fact, an example of oneof the particular problems of backtracking search, i.e. the problem of the late discov-ery of failure [53]. Clearly problem solving could be made more e�cient by promotingthese integrity constraints into the body of the program so that they can be used toavoid the generation of semantically invalid solutions altogether. We can illustrate thisusing the music lesson allocation program given earlier. Suppose that, by law, no musiclessons may be given on Wednesdays. This information would be modelled by addingthe following integrity constraint to the database:extend module musicdbconstrain no ml in music lessonto have day(l) = ``wednesday'';which can then be incorporated into the allocate program to prevent any attemptedallocations of music lessons on Wednesdays:

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 146program allocate isfor each s in student such that no l in music lesson has pupil(l) = sfor a new ml in music lesson such thatteacher(ml) = any(t in music teachersuch that instrument(s) in instruments(t)) andpupil(ml) = s andday(ml) in freedays(teacher(ml)) andday(ml) in freedays(pupil(ml)) andday(ml) <> ``wednesday''print("Student", name(s), "allocated to", name(teacher(ml)), day(ml));Having promoted the constraint in this way, it can be considered as a potential generator,along with the other constraints placed on the attribute's value. We can make a furthersaving in execution time by disabling the promoted constraint for the duration of thesearch process, since we know that the transformed program is unable to generate trialsolutions that will violate it. In fact, the constraint can remain disabled during thecommit phase too, although we must be sure that it is re-enabled before the actionsof the program are executed (or before the program terminates if no solution can befound.)We have implemented an extension to the Daplex compiler which performs thiskind of optimisation. Like the Daplex query optimiser, this semantic optimiser works byexamining and transforming the intermediate code representation of the input program.However, as we saw in Chapter 3, integrity constraints are represented internally inthis same ICode format, so the process of incorporating constraints into programs isbasically one of deciding which constraints to incorporate and where within the originalprogram they should be placed. In this implementation, we promote only simple domainconstraints on the attributes of the loop class that are directly involved in the looppredicate. We de�ne a simple domain constraint to be a universally-quanti�ed constraintinvolving a single-valued comparison of some attribute value with either a literal or avalue de�ned in terms of said attribute.Here, then, is the algorithm for extracting the set of simple domain constraintsthat are relevant to an attribute function f, involved in a non-deterministic update toclass c:

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 147(i) Find the set of all constraints that might be a�ected by an addfnval/newentity-type update to function f. We can use the constraint indexes that are used toextract the constraints for run-time checking to generate this set quickly.(ii) Extract all the constraints from this set that take the form of a simple domainconstraint on f. To do this, we �rst discard all numerically-quanti�ed constraintsand then convert those that remain into constraint graphs (as described in Sec-tion 3.3). Within this set, the simple domain constraints are those which have oneof their subgraphs matching the graph representing the constrained attribute:[class(c, ), function(f), class( , )]This test rules out non-simple domain constraints, such as this alternative (al-though not completely equivalent) statement of the music lesson constraint givenabove:constrain no t in teacherto have day(teacher inv(t)) = ``wednesday''Although constraints of this form are potentially useful for optimisation purposes,the optimiser is not as yet able to make use of them and they must, therefore, be�ltered out.At the end of this process, we have a set of constraints, represented as constraint graphs,which are suitable for promotion into the body of the program. Before this can happen,however, it is necessary to make sure that they are all represented as constraints quan-ti�ed by for all. It is possible to promote not exists constraints in their original form,e.g. : : :for a new ml in music lesson such thatteacher(ml) = any(t in music teachersuch that instrument(s) in instruments(t)) andpupil(ml) = s andday(ml) in freedays(teacher(ml)) andday(ml) in freedays(pupil(ml)) andno ml2 in music lesson has day(ml) = ``wednesday'': : :

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 148but this results in an even less e�cient translation than the unoptimised version, sincethe constraint must be checked for all music lesson instance at each stage in the searchprocess. With for all constraints, on the other hand, we can safely restrict our attentionto the single music lesson instance in hand, thanks to the properties of that quanti�er| just as we did when simplifying constraints for run-time checking. In order to converta constraint from its not exists form to its equivalent for all form (and vice versa) it isonly necessary to negate the comparison operator which links the left- and right-handside constraint graphs.The exact positioning of a promoted constraint within the loop predicate dependson whether it is suitable for use as a generator, and if so whether it would make a morerestrictive generator than that which has been selected from the constraints given inthe program. A constraint is a potential generator if its comparison operator is \="(recall that we do not yet support set-valued comparisons in constraint predicates |if we did, then constraints involving the set-membership operator in would also besuitable candidates for generators.) Constraints involving any of the other single-valuedcomparison operators are suitable only for use as tests.The set of constraint graphs, then, is divided into those which are suitable for useas generators and those which are not. If the current generator takes the form of a set-valued comparison, then the promoted constraint generator will be more restrictive andshould replace it. The original generator is then relegated to the status of a constraint.If the current generator is a single-valued comparison, on the other hand, then we retainit as the generator and the potential generators are treated as tests.Unlike the non-generating constraints given in the problem speci�cation, whichare all tested after the creation of the trial instance, the constraints taken from the in-tegrity constraints are tested immediately after the relevant attribute's value has beengenerated. This is important if we are to gain any signi�cant increase in e�ciency bythis type of program transformation, since we wish to use the constraint information toprevent us from creating obviously erroneous trial instances, rather than to test theirvalidity afterwards. The existing constraints are checked after the partial solution hasbeen generated because of the di�culty of deciding whether they refer to the completedatabase or just the trial instance. We know, however, that our promoted integrity con-straints refer only to an attribute of the trial instance and any data that is immediately

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 149accessible from it, and they can therefore be checked before the update quite safely.It remains only to describe how the integrity constraint graphs are convertedinto a form suitable for inclusion within a Daplex program. We will illustrate theprocess using the allocate program, and the integrity constraint on music lesson daysgiven above. Before optimisation, the ICode representation of the generator for the dayattribute is:generate(freedays, [person], [var(evar3)], string, var(evar6))where var(evar3) represents the variable storing the value generated for the teacherattribute, and var(evar6) is the variable storing the value generated for the day at-tribute by this code. The constraint graph representation of the integrity constraintis: cgraph([class(music lesson, var(evar1)), function(day),class(string, var(evar2))],expression(<>, var(evar2), wednesday),[class(literal, wednesday)])However, this expresses the constraint check for some music lesson instance (repre-sented by the variable var(evar1)), whereas we require a graph that expresses thecheck against the value of the day attribute in this particular case (represented by thevariable var(evar6)). We can build such a graph by identifying the variable represent-ing the value which represents the day attribute in the constraint graph, and replacingit with var(evar6):[class(string, var(evar6))],[class(literal, wednesday)],expression(<>, var(evar2), wednesday)Translated back into ICode, this becomes simply:[expression(<>, var(evar6), wednesday)]We must then combine this ICode with the current generator. Since the integrityconstraint is not suitable for use as a generator, its ICode is placed at the end of thecurrent generator's code, resulting in the following Prolog fragment being incorporatedinto the �nal program:

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 150: : :getfnval(freedays, [T], D),D n== wednesday,: : :which clearly has the desired e�ect. If the integrity constraint was suitable for use asa generator, and it was more restrictive than the current generator, then it would havebeen placed at the front of the current generator code block.Exactly how e�ective an optimisation technique this is depends on how tightlyconstrained the problem description is to begin with, and how well the integrity con-straints are able to reduce the search space. However, the results can be signi�cant,even on quite small examples. For example, the allocate program, when run againsta database containing three music teachers and twelve students requiring lessons (de-tails of which are given in Appendix E) created a total of 1336 trial objects, 1324 ofwhich were subsequently found to be part of an invalid solution. The optimised version,however, generated only 742 trial objects (of which 730 were removed on backtracking).5.5 Related WorkFloyd's early work on adding non-deterministic constructs to ALGOL triggered thedevelopment of similar extensions to other languages, e.g. Fortran (ND-Fortran [25]).The idea, however, was most attractive to the developers of functional languages, and inparticular Lisp, since the introduction of some clearly de�ned choice operator can greatlysimplify the description of search problems within a functional context. Functionalprogramming languages are not, in general, suited to the expression of backtrackingsearch problems, and generally have to fall back on the less e�cient generate-and-teststrategy, although lazy evaluation can reduce the overhead of this approach when onlya subset of the possible solutions are required.There have been several proposals for non-deterministic dialects of Lisp [20, 77,110, 101], most of which choose to implement non-determinism with a backtrackingsearch strategy, as we have done. Backtracking [48] is a very general search techniquethat can be applied to a wide range of search problems, but it is also potentially veryine�cient and exhibits the behaviour known as \thrashing". Van Hentenryck lists thefailings of backtracking as:

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 151(i) continual rediscovery of the same facts: the fact that some values satisfy or do notsatisfy a particular constraint is rediscovered many times during the search;(ii) late detection of failures and useless generation: the failures are detected latein the search and after some useless generations, thus increasing the amount ofbacktracking;(iii) bad backtracking point: the procedure backtracks to the �rst choice, which possiblyhas nothing to do with the current failure. ([53], p. 21)It was not long, therefore, before the developers of non-deterministic languages realisedthat the usefulness of those languages for solving real problems was limited by theinherent ine�ciencies of the underlying search technique. The obvious answer was toreplace the backtracking search with something more e�cient. The POPS system [47],for example, used a heuristic search technique similar to that of the GPS system [82]to identify a path of execution through a non-deterministic program that representsa successful execution of that program. More recently, Schemer, a non-deterministicdialect of Lisp [110], was used as a front-end to a dependency-directed search engine.But perhaps the most comprehensive attempt at providing intelligent support forsolving a range of search problems, and certainly one of the earliest, is the REF-ARFsystem [41]. The REF language is a typical late 60's imperative programming language,reminiscent of ALGOL, except for two special language constructs called select andcondition which provided support for describing non-deterministic algorithms. Theselect function takes two integer arguments, and non-deterministically returns an in-teger from within the range that they describe. The condition construct introducesa boolean-valued expression which describes a constraint on an arbitrary set of pro-gram variables2. Problems expressed in the REF language are interpreted by the ARFproblem solver, which attempts to solve them using a combination of heuristic searchand (even by today's standards) relatively sophisticated constraint manipulation tech-niques. These include domain �ltering (i.e. Waltz �ltering [52]), constraint propagationand early detection of failure due to con icting constraints.2Notice that these two constructs perform the same respective roles as the generators and testconstraints in non-deterministic Daplex, but that the procedurality of the REF language demands thatthe distinction between the two kinds of constraint be made explicitly, in advance, by the programmer.

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 152The results are impressive. For example, under REF-ARF, the well-knowncryptarithmetic problem SEND+MORE=MONEY reduces to a search space of foursolutions after constraint manipulation, and is solved in only two backtracking steps[41]. This compares very favourably with the recent CHIP system which requires onebacktracking step after constraint manipulation [53].All these languages extend some deterministic language with new constructs fordescribing the non-deterministic assignment of values to variables, and for describing theconstraints on these variables that de�ne a solution. By contrast, our non-deterministicversion of Daplex reuses the existing features of the language, but exploits the declarativespeci�cation of the problem to decide on their exact interpretation (i.e. as a non-deterministic or deterministic generator, or as a constraint) according to the contextin which they appear. Compare, for example, this solution to the N-Queens problemexpressed in Screamer (taken from [101]):(defun an-integer-between (low high)(if (> low high) (fail))(either low (an-integer-between (1+ low) high)))(defun attacks? (qi qj distance)(or (= qi qj) (= (abs (- qi qj)) distance)))(defun check queens (queen queens &optional (distance 1))(unless (null queens)(if (attacks? queen (first queens) distance) (fail))(check-queens queen (rest queens) (1+ distance))))(defun n-queens (n &optional queens)(if (= (length queens) n)queens(let ((queen (an-integer-between 1 n)))(check-queens queen queens)(n-queens n (cons queen queens)))))with the Daplex solution:

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 153declare queen ->> entitydeclare row(queen) -> integerdeclare column(queen) -> integerkey of queen is row, column;define safe(q in queen) -> stringif some q1 in queen has(q1 <> q and (column(q1) = column(q) or row(q1) = row(q) orabs(row(q1) - row(q)) = abs(column(q1) - column(q))))then "no"else "yes";define nqueens(n in integer)for each i in 1 to nfor a new q in queen such thatcolumn(q) = i androw(q) in 1 to n andsafe(q) = "yes"print('Queen placed at', column(q), row(q));The fact that Daplex is built on top of a language which already allows non-deterministicproblem expression means that problems can be expressed at an even higher level ofabstraction, and the user can be even further removed from the need to specify controlinformation. This also gives the DBMS much greater freedom in selecting the searchstrategy to use in simulating the non-determinism | a feature we intend to exploit infuture implementations.The non-deterministic extension to Daplex presented here, however, is concernedwith the speci�cation of database updates rather than with providing a general-purposeconstraint programming language. The potential use of non-determinism in aiding thespeci�cation of database updates was �rst recognised in the area of logic programming| in which, interestingly enough, updates rather than non-determinism presented theanomaly. One of the �rst proposals for providing a declarative semantics for update in alogic programming framework (DLP [75]) included the possibility of a non-deterministicapproach to updates, by specifying post-conditions on the creation or deletion of facts.For example, the authors give the following program which non-deterministically enrollsan employee on one of two courses, so that neither of the courses is over-subscribed:

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 154<enroll(EName)> <+sec1(EName)>(size(sec1, N) & N < 30)<enroll(EName)> <+sec2(EName)>(size(sec2, N) & N < 20)Here, atoms in angled brackets (e.g. <enroll(EName)>) indicate dynamic predicates(i.e. predicates which are de�ned in terms of updates) and the operator de�nes anupdate rule. The body of an update rule consists of a dynamic predicate of the form:<E>(Q), where E speci�es the update (or invokes another update predicate) and Q is aquery that must evaluate to true in the database after the execution of E. The authorssuggest a Prolog-type backtracking strategy to implement this kind of update.A similar proposal by Chen [23] allows non-determinism to be speci�ed usingdisjunctions and existential quanti�cation. For example, in Chen's system the enrolmentprogram stated above would be speci�ed as:( (+sec1(EName) _ +sec2(EName))^ +size(sec1, N1) ^ N1 < 30^ +size(sec2, N2) ^ N2 > 20)In this language, a plus sign preceding an atomic formula indicates that it must evaluateto true in the database state resulting from the update, and a preceding minus signindicates that it must evaluate to false in the new database. An atomic formula withno preceding symbol must evaluate to true in the current (i.e. non-updated) databasestate if the update is to be carried out.Another approach to combining non-determinism and updates in a rule-basedlanguage is illustrated by the Logres DBPL [19]. Logres is unusual in several respects.Firstly, it is based on an object-oriented data model rather than the simple relationalmodel more common in rule-based databases. In fact, Logres uses the notion of objectidenti�ers to express both the creation and deletion of objects and non-deterministicchoice from a set of instances. Secondly, Logres rules are executed in parallel usingan in ationary �xpoint semantics [18]. It is this semantics, together with the non-deterministic choice, that allows the speci�cation of complex updates in Logres. Forexample, the following program (taken from [19], p. 337) solves the well-known four-colour map colouring problem:

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 155assign(zone: Z, color: C) zone(Z).:assign(zone: Z, color: C) adjacent(from: Z, to: Z2),assign(zone: Z2, color: C).Here, zone and color are object classes, assign is a relationship between a zone anda particular color, and adjacent is a relationship between two zones. The variablesZ, C and Z2 represent object identi�ers. In Logres, a positive rule whose head is thename of a class (or relationship) represents the creation of one instance of that class (orrelationship) for each of the variable substitutions that satisfy the rule body. Similarly,a negative rule represents the deletion of all the instances identi�ed by the rule body.Non-deterministic choice is speci�ed by leaving variables representing attribute valuesin the head of the rule unbound, as in the case of the variable C in the �rst rule givenabove. This rule, then, describes the creation of one assign relationship for each ofthe zones on the map to some non-deterministically selected color. The second ruledescribes the deletion of all such relationships which violate the constraint that no twoadjacent zones have the same colour assignment. In this rule, C is bound within thebody (by the assign relationship) and therefore is not involved in any non-deterministicchoice. When an instance of assign is deleted in this way, the relationship describedby the �rst rule (i.e. that each zone should have a color) is violated, and it �res againto create another assignment for the deleted zone with a di�erent color.A more common form of non-determinism, particularly in deductive databasesystems, is that based on the choice construct of LDL [81]. This construct, which wasoriginally provided as a declarative version of the Prolog cut, allows the user to specifythat multiple solutions to a particular goal are redundant and need not be considered.For example, the following LDL program decides whether it is possible to partition thegiven set S so that the sum of the elements in each of the partitions is the same (basedon the example given in [81], p. 183):equiPartition(S) partition(S1, S2, S),sum(S1, N),sum(S2, N).To determine the truth of equiPartition(S) for any set S, it is only necessary to locateone partition that meets these criteria. All other partitions are redundant, and thetime spent generating them is time wasted. This problem (i.e. generation of redundant

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 156solutions) is particular acute for systems like LDL which employ a bottom-up semanticsfor the evaluation of programs. These systems typically generate all the solutions as thedefault behaviour, whereas systems which use a top-down evaluation strategy, such asProlog, will typically generate only a single solution unless the user speci�cally requestsfurther solutions.The non-deterministic choice construct of LDL has the form:choice(( �X), ( �Y ))where �X and �Y are vectors of variables. Within a program, the choice construct non-deterministically selects a set of substitutions for the variables in �X so that the functionaldependency denoted by �X ! �Y is satis�ed. For example, the following program (takenfrom [81], p. 184) builds a set containing exactly one employee from each department:selectEmp(Name) emp(Name, Dept),choice((Dept), (Name)).It is possible for �X to be empty, in which case the choice construct is equivalent to theexistential quanti�cation of each of the variables in �Y . Our set-partitioning exampleillustrates this:equiPartition(S) partition(S1, S2, S),sum(S1, N),sum(S2, N),choice((), (S1)).The current implementation of LDL has the ability to backtrack to choices and to makeanother selection is it is found that the original selection of variable substitutions do notmeet the constraints on them (i.e. if some other predicate evaluates to an empty set ofsolutions). However, other developers [46, 95] have chosen to restrict the semantics ofthe choice operator so that systems are committed to the �rst selection that is made,on the grounds that this allows a much more e�cient implementation3. While this morelimited form of choice is still able to �ll the role of the Prolog cut in these systems, itdoes dramatically reduce its usefulness for the description of the kinds of constrainedsearch problems we are concerned with in P/FDM.3Even the implementors of LDL are apparently planning to remove this feature from future imple-mentations, for the same reasons [95].

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 1575.6 Non-Deterministic Updates in the Protein DatabaseThe main application of the protein structure database is a system which constructsmodels of proteins of unknown three-dimensional structure, using a technique calledhomology modelling [64]. This technique is based on the observation that proteins withsimilar sequences (i.e. with a high sequence homology) tend to adopt similar conforma-tions, with the central core in particular being strongly conserved among such proteins.In fact, it is only necessary to have 30% sequence identity between the protein to bemodelled and some protein of known three-dimensional structure (the template protein)in order to be able to build a plausible model of the former based on the structure ofthe latter.The process of homology modelling, then, consists of identifying the regions ofthe template protein which are likely to be conserved in both proteins, and using these asa template for building the core of the protein model. The remaining regions (i.e. thosewhich are not conserved between the proteins) are modelled by searching the proteinstructure database for loops in other proteins that will �t correctly into the model, andwhich can be used as structural templates for these regions. The major steps in thisprocess are:(i) Align the sequences of the two proteins, introducing gaps into either sequencewhere this improves the alignment. In the current version of the homology mod-elling application, this process is carried out manually by the user.(ii) Identify the regions which are likely to be structurally conserved in both proteins.These are regions of high sequence identity which do not contain any gaps, andany region which adopts the conformation of a helix or a strand in the proteinof known structure. It is also expected that disulphide bridges will be conservedwhere both cys residues are present in the protein to be modelled.(iii) For each of the unconserved regions, search the database for fragments (calledcandidate fragments) of the required length and whose conformation is suchthat the fragment will �t into the model at the appropriate point.

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 158(iv) For each of the unconserved regions, select the \best" of the candidate fragmentslocated in Step (iii), and use its structure as a template to generate the structureof the corresponding region of the model. The fragments should be selected sothat there is no steric overlap between any parts of the modelled chain.(v) At the end of the previous step, a complete model of the protein backbone shouldhave been created. The �nal step, therefore, is to add the appropriate side chainto each residue in this model. Again, the conformation of each side chain shouldbe chosen so that there is no steric overlap in any part of the model.Of these �ve steps, all but Step (iii) (which is a completely deterministic search process)are candidates for expression as non-deterministic Daplex updates. Steps (iv) and (v)can both be expressed using the version of Daplex described in this chapter, as weshall demonstrate. Steps (i) and (ii) are not currently expressible in Daplex, since theyboth require the ability to state that a particular function of the solution should bemaximised or minimised, which the language does not yet allow.The main constraints on the update implementing Step (iv) of the homologymodelling process are that one candidate fragment should be selected for each of theunconserved regions in the protein being modelled, and that the resulting chain shouldinvolve no steric overlaps. We can express this update in Daplex as:define select fragments(c in model chain) in pdbfor each br in backbone ranges(c) such that type(br) = `ùnconserved''for a new sf in selected fragment such thatfor range(sf) = br andfragment(sf) = any(candidate fragment(br)) andconformation(sf) = conformation(fragment(sf)) andsteric overlap(c, conformation(sf)) = ``no''let model status(c) = ``fragments selected'';The constraints on the update required for the �nal step of modelling task arevery similar: for each residue in the chain being modelled, we must select one side chainconformation, from a library of rotamers (i.e. known favourable conformations), so thatthere is no steric overlap anywhere in the model. The Daplex expression of this updateis:

CHAPTER 5. NON-DETERMINISTIC UPDATES IN P/FDM 159define place side chains(c in model chain) in pdbfor each r in has residues(c)for a new sc in side chain such thatfor residue(sc) = r andname(sc) = name(r) andconformation(sc) in possible conformations(name(r)) andsteric overlap(c, conformation(sc)) = ``no''let model status(c) = ``side chains placed'';In the current implementation of the homology modelling application, both of thesecomplex updates had to be expressed as Prolog programs by a Prolog programmer. Thenon-deterministic extension to Daplex, however, allows more of the application code tobe expressed in the higher-level, more concise language, thus making the speci�cationof complex updates more accessible to the casual user.

Chapter 6Conclusions and FutureDirectionsIn order to provide more intelligent support for updates than is traditionally provided bydatabase systems, it is necessary for the DBMS to be able to make more intelligent useof the semantic domain information that is available. Users need to be sure that theirupdates will not violate the structural or semantic integrity of their data, even whenmaking complex composite updates that require the creation of invalid intermediatestates. Users also need to be able to \try out" particular updates, in the knowledgethat the DBMS will handle the book-keeping required to return to the initial state ifthis becomes necessary at any stage in the update. And, �nally, users also need help inexpressing their updates at a higher-level of abstraction and with less of the orderingand control information typically required by traditional data manipulation languages.In this thesis, we have described three extensions to the P/FDM database systemthat provide support for the user in these areas. The focus in each case is on the useof structural and semantic domain information to allow the DBMS to perform thenecessary searching and book-keeping tasks that would be di�cult or tedious for theuser to undertake.Integrity Constraints The Daplex data de�nition language has been extended toallow the declarative description of semantic integrity constraints (Section 3.3).The constraint language reuses many of the existing Daplex constructs so that itshould be easy to use for anyone already familiar with Daplex. The de�nition of160

CHAPTER 6. CONCLUSIONS AND FUTURE DIRECTIONS 161integrity constraints by the user also provides the DBMS with a useful source ofdeclarative knowledge about the semantics of individual application domains.In order that the run-time integrity checking should be as e�cient as possible,the DBMS compiles each declarative constraint into a set of Prolog fragments,each of which checks the constraint for a particular update (Section 3.3.3). Thesefragments are stored in the metadata, along with explicit, two-way links to thedeclarative constraint description from which they were generated (Section 3.2.1).These links allow both the run-time retrieval of appropriate code fragments, forconstraint checking, and the run-time creation and deletion of constraints, withoutrequiring any recompilation of code. The latter ability is particularly importantin database applications involving complex domains, for which it is di�cult toexpress constraints correctly at the outset, or in which exceptions to constraintsare regularly entering the database (Section 3.4).Constraints and Transactions A simple transactionmechanism has been implemented,which allows the temporary suspension of integrity constraint checking duringcomposite updates (Section 4.1). Within the scope of a transaction, the useris free to experiment with updates, while the DBMS uses the structural con-straints of the data model to handle the necessary book-keeping for returning thedatabase to its pre-transaction state, should the user require it (Section 4.2.1).At transaction commit-time, the DBMS takes responsibility for ensuring thatthe new database state satis�es all the integrity constraints, before removing thebook-keeping information and causing the updates made during the transactionto become \permanent" (Section 4.3).Thanks to the simplicity of the transaction abort process within our architecture,the transaction mechanism can also be used to support hypothetical transactions,in which the user experiments with a series of updates in order to answer what-if-type questions.Non-Deterministic Updates The third and �nal extension to P/FDM adds a newloop construct to the Daplex data manipulation language, that allows users to de-scribe the creation of sets of objects in terms of the constraints that the resultingdatabase state must satisfy, rather than as an explicit sequence of update opera-

CHAPTER 6. CONCLUSIONS AND FUTURE DIRECTIONS 162tions (Section 5.1). The DBMS uses these constraints, and any relevant integrityconstraints, to generate a Prolog program that searches for a sequence of updatesthat will create a set of objects with the required properties (Section 5.3). Thus,the user is able to describe their updates declaratively, at a high level of abstrac-tion, and can leave the DBMS to \infer" the necessary control information, andthe low-level details of the update, using the constraints that have been speci�edin the problem description.These three extensions combine to help bridge the gap between the declarative aspectsof P/FDM's functional data model and the procedural view of updates that the practi-calities of managing large amounts of persistent data demand. Rather than attempt toabolish this gap between retrieval and update entirely, our basic philosophy has been toprovide the user with the means to describe the situations in which a declarative view ofupdates is appropriate for the DBMS - either for restricting the set of allowed updatesto those that will result in the creation of a legal database state, or in generating a setof individual updates that will meet the user's more abstractly speci�ed requirements.6.1 Useful Architectural Features for Constraint-BasedUpdatesThe following architectural features of P/FDM have proved particularly useful in im-plementing the three extensions described in this thesis.Separation of Conceptual and Storage Schemas The clear division between theconceptual level and the storage level which the internal primitives provide notonly enables several storage schemas to be in use simultaneously but it also allowsus to de�ne new module types in terms of existing internal types. The resultcan be a considerable saving in the code required to implement a new databasefeature, and also increased maintainability of the resulting system, as we saw inthe implementations of both the transaction module type (Section 4.2) and thebacktrackable module type (Section 5.2).Well-De�ned Interface to the DBMS The advantage of a \narrow" and well-de�nedinterface to the DBMS (as provided by the primitives, above which higher-level

CHAPTER 6. CONCLUSIONS AND FUTURE DIRECTIONS 163interfaces are constructed), is that new features implemented at the Prolog levelautomatically become available for use by any program using these higher-levelinterfaces. Thus, for example, any Daplex program executed within a transac-tion will automatically invoke the correct transaction behaviour, even though theDaplex parser is not itself aware of the existence of the new module type (Sec-tion 4.1).Flexible Access to Metadata The exibility of access to metadata, both in terms ofdata storage and retrieval, has also been an important factor in the implementationof all three of the extensions described in this thesis. In particular, the abilityto store code fragments in the metadata is particularly useful when compilingdeclarative constraint speci�cations (Section 3.3), and exible access to constraintmetadata is also important for those parts of the DBMS which want to makeuse of the semantic constraint information, such as the semantic optimiser fornon-deterministic Daplex programs (Section 5.4).Flexible Metaprogramming Facilities One important advantage of using Prolog asthe implementation language for P/FDM is its ability to manipulate and executeprogram code at run-time, without requiring extension re-compilation of the sys-tem code. This feature means that new module types, like the transaction andbacktrackable module types described in this thesis, can be added dynamicallyto the DBMS simply by consulting in the de�nitions of the internal primitiveswhich implement it. It also allows us to defer the decision of which constraintchecking fragments must be executed for a particular update until run-time, thusgiving us the exibility to add or delete constraints at any stage in the lifetime ofa particular database (Section 3.2).Finally, the ease with which Prolog code (and other complex structures such asICode lists) can be generated and manipulated has greatly simpli�ed the processof generating the simpli�ed procedural representations of the semantic constraints(Section 3.3.3), the generation of backtracking programs from Daplex (Section 5.3)and the incorporation of integrity constraints information into Daplex code (Sec-tion 5.4).

CHAPTER 6. CONCLUSIONS AND FUTURE DIRECTIONS 164These architectural features have also in uenced the e�ciency of the resulting system.In the implementation of the comstraint compiler, the narrow and well-de�ned interfaceto the DBMS allows us to identify the speci�c set of update events that could violate aparticular constraint and thus to generate highly-speci�c simpli�ed procedural versionsof the constraint. The more simpli�ed a constraint is with respect to a particular update,the fewer redundant database accesses it requires and the more e�cient it is to check.The e�ciency of the metadata design also helps to make the run-time retrieval of thesefragments less time-consuming.The implementation of the constraint checker for transactions does not takeadvantage of any of these features, and is at present rather ine�cient. However, thisis not due to any architectural feature of P/FDM and we believe that a more e�cientimplementation is possible in our system (see Section 6.2.2).The e�ciency of programs involving non-deterministic updates has a direct bear-ing on the usefulness of the extended Daplex language, since it determines which of theexpressible problems are solvable in a reasonable time. Clearly, the more e�cient thegenerated code, the greater the number of problems that can be solved. Our current im-plementation, which is based on a backtracking search strategy, is not e�cient enough toallow the solution of a broad range of real problems. However, as in the case of trans-action constraint checking, this ine�ciency stems from the current translation fromdeclarative to procedural constraint manipulation rather than from the architecturalframework in which it is implemented, and we believe that translation to more e�cientsearch strategies is possible within the current system (see Section 6.2.3).6.2 Future Directions6.2.1 Integrity ConstraintsAs we saw in Chapter 3, although we can currently generate code for only a restrictedclass of constraints, our constraint language extension to Daplex is actually capable ofexpressing a much wider range of constraints. One obvious area of future work, then,would be the extension of the simpli�cation techniques to, for example, constraints withmultiple quanti�ers and constraints involving disjunctions, arithmetic expressions and

CHAPTER 6. CONCLUSIONS AND FUTURE DIRECTIONS 165aggregate operators.Such an extension of the language would require a signi�cantly more sophisticatedapproach to the generation of the set of events which may violate a given constraint,than the simple, table-driven approach used in the current implementation. However,techniques for performing such an analysis have been developed for �rst order logicconstraint languages [105], which could be adapted to our functional-style language.To deal with more complex constraint predicates, however, it should be possibleto reuse the constraint graph approach described earlier, with some minor modi�cations.Constraint graph manipulation is really only a technique for rewriting chains of com-posed function applications so that each chain begins with a variable of known value,and is therefore equally useful for representing more complex predicates. For example,the following constraint involving an arithmetic expression:constrain each t in travellerso that amount(wine(t)) + amount(beer(t)) + amount(spirits(t))� alcohol limit(destination(t));could be represented by the graphs:T W Awwine amount+ T B Abbeer amount+ T S Asspirits amount� T C Ldest alc limitIn order to simplify this constraint for the updateaddfnval(beer, [Trav], Crate)we can use exactly the same process described in Chapter 3 to reformulate the graphsas:

CHAPTER 6. CONCLUSIONS AND FUTURE DIRECTIONS 166Crate Abamount+ Trav W Awwine amount+ Trav S Asspirits amount� Trav C Ldest alc limitin which case, each graph begins with one of the known variables, Trav and Crate,and can therefore be translated into fragments of ICode that will produce the resultvariables Ab, Aw, As and L.Of course, the translation of constraint predicates is complicated by the presenceof nested quanti�ers, which may introduce a variable (or variables) which cannot bederived from the known variables. In these cases, we will have no option but to searchthe database for values for these variables - a potentially expensive process.Another possible approach to generating more e�cient constraint code fragments,which would complement the simpli�cation techniques already in use, is the use of theexisting Daplex optimiser to eliminate duplicate expressions and to �nd an e�cientordering for the evaluation of the constraints. Since the optimiser operates on the sameformat as the internal representation of constraints (i.e. ICode) this should be possiblewith only small architectural change to the system.6.2.2 TransactionsThe current transaction mechanism allows only one transaction to be active at any onetime. The implementation, however, could be easily adapted to allow nested transac-tions, thanks to the fact that we do not rely on the names of the current transactionmodules being �xed, but retrieve them from the metadata at run-time. It would be nec-essary to implement some kind of \stacking" mechanism for the transaction active/3terms, so that each primitive retrieves the metadata marker (and therefore the names ofthe transaction modules) of the most recently created transaction. On transaction com-mit, the \top" transaction active/3metadata term would be removed, thus allowingthe next term, representing the next level transaction, to become visible. The update

CHAPTER 6. CONCLUSIONS AND FUTURE DIRECTIONS 167primitives used to commit the updates for the nested transaction would therefore seethis transaction marker, and copy the updates, not to the disk-based module, but tothe transaction modules of the next active transaction.The area of the current implementation of transactions which o�ers best scopefor improvement is the constraint checking process. As we saw in Chapter 4, the ini-tialisation code, which searches the entire database for constraint violations, is reusedat commit-time to search for violations caused by the committing transaction. This isine�cient because it means that even data which has not been a�ected by the transac-tion, and which could not therefore violate the constraint, is rechecked for validity atcommit-time. Ideally, we require some version of the simpli�cation methods which canfocus attention on those parts of the database which have been directly a�ected by thetransaction updates.One possible implementation of such a technique is to reuse the constraint graphsgenerated during constraint compilation to build a set of templates, which are partiallyinstantiated using the data in the transaction modules. For example, if we have theconstraint:constrain each p in pupilso that country(penfriend(p)) = country(language teacher(p));which is represented by the constraint graph:P1 P2 Cppenfriend country= P1 LT Ctlang teacher countryThen the templates we must construct will have the form:[P1, P2, Cp, P1, LT, Ct]where each variable corresponds to a node in the constraint graph. Suppose we wish tocheck the validity of this constraint at the end of a transaction involving the followingset of updates:

CHAPTER 6. CONCLUSIONS AND FUTURE DIRECTIONS 168newentity(pupil, [tippett, michael], pupil(60))addfnval(penfriend, [pupil(60)], pupil(29))addfnval(language teacher, [pupil(60)], teacher(12))addfnval(country, [teacher(24)], spain)then at commit-time we must construct the following two instances of the template inorder to check the constraint:[pupil(60), pupil(29), Country1, pupil(60), teacher(12), Country2][Pupil1, Pupil2, Country3, Pupil1, teacher(24), spain]Notice that the �rst template contains the details of the �rst three updates, which areall based on the single pupil instance, pupil(60). By creating only the minimal setof templates, in this way, we can ensure that the constraint is checked only once perinstance (and not three times as would have been the case had we checked each updateindividually). The second template corresponds to the last update, which as far as weknow at this stage is not directly related to pupil(60). The next stage in the processexamines each template in turn, and �lls in the unknown values by retrieving data fromthe database. When the templates are all fully instantiated, the constraint conditionscan be checked. In this example, the constraint is satis�ed by the transaction ifCountry1 = Country2 ^ Country3 = spainIn the case of numerically-quanti�ed constraints, we can simply count the number oftemplates which satisfy the constraint predicate, and calculate the total at the end ofthe transactions as:original count + templates from add-module- templates from delete-moduleThe result of the constraint checking process is a list of the templates which do notsatisfy the constraint from which they were generated. Since the templates are allfully instantiated by this stage, they can be used to provide a comprehensive errorreport to the user, on why the constraint checking process has failed, possibly in somegraphical form. In the protein database, for example, it would be possible to use amolecular graphics package to highlight the areas of a particular protein which violatesome constraint.

CHAPTER 6. CONCLUSIONS AND FUTURE DIRECTIONS 1696.2.3 Non-Deterministic UpdatesThere is considerable scope for extending the syntax of \for a new" loops to enable thespeci�cation of a wider range of constraints. For example, we could allow disjunctionswithin the loop predicate, or we could base the update on function value additions, thusallowing the non-deterministic creation of relationships between existing instances aswell as the creation of new instances.It would also be useful to be able to specify that a particular expression valuebe optimised (i.e. maximised or minimised) in the chosen solution. For example, thiswould allow us to describe the sequence alignment update from the protein modellingapplication using non-deterministic updates. However, it is not clear how this kindof optimisation problem could be implemented using the current backtracking searchstrategy. Another very useful extension would be the ability to specify non-deterministicdeletions as well as creations, which would allow the solution of problems by manipu-lating the existing data as well as by creating new objects. This last feature would beuseful when a more restricted version of the problem has already been solved, and thecomplete problem is to be solved incrementally based upon the existing data, or whensome problem cannot be solved without some alteration to the existing data.Apart from extensions to the syntax of the update language, there are also severalopportunities for improving the generated Prolog code. We could, for example, makeuse of the existing Daplex optimiser, both for making a more informed selection ofgenerators and for making a more careful ordering of the constraint checks. The useof integrity constraints for semantic optimisation could also be improved, with betterselection of constraints for inclusion into the program and use of a wider range ofconstraints, possible as heuristics to guide the ordering of the search process.If we are to make any signi�cant improvements in the e�ciency of the Prologgenerated from non-deterministic updates, however, it will be necessary to replace thebacktracking search strategy with something more intelligent. Since our system operatesin a logic programming environment, the obvious choice for an alternative search engineis a constraint logic programming language such as CHIP [53]. CLP languages usespecial techniques to reason about constraints and to reduce the domains of constraintvariables - thus cutting down on the size of the search space that has to be explored.

CHAPTER 6. CONCLUSIONS AND FUTURE DIRECTIONS 170The potential usefulness of CLP techniques in the analysis of protein structure hasalready been demonstrated by Clark et al. [24] in the prediction of protein topologies.In another project at Aberdeen, P/FDM is being used in conjunction with CHIP in theinterpretation of 2-D NMR spectra [71], and so the architectural links for using CHIPwithin P/FDM applications already exist.What we would like, then, is a version of the compiler for non-deterministicupdates that uses the constraints supplied by the user, and any relevant integrity con-straints, to generate a CHIP program that searches for a solution to the problem. Thiscode fragment could be passed to the CHIP system, along with any relevant sets ofdata, for manipulation and solution. The CHIP system would then return details ofthe objects that must be constructed in order to satisfy the required constraints in thedatabase.Such a translation would also make it much easier to handle more sophisticatedconstraint speci�cations involving optimisation, since CHIP provides direct support forsolving such problems. It would also be easier to deal with deletions of data, since nobacktracking over updates is required. All the searching is done within CHIP, in termsof its domain variables, and deletion problems could therefore be solved by taking adi�erential approach to the representation of the updates, similar to that used in ourtransaction mechanism.6.2.4 A Combined System for Repairing TransactionsOne possible improvement to P/FDM, that would combine all three of the elementsdiscussed in this thesis (i.e. constraints, transactions and non-deterministic updates), isto use the constraint information to suggest possible updates for restoring the validityof any integrity constraints that are violated at the end of a transaction. This ideawas proposed by Morgenstern [78], who suggested that constraint speci�cations couldcontain \hints" as to how the data might be altered to restore integrity (for example,which relationships might be broken in response to constraint violation). Morgensternalso proposed an active-rule approach to implementing such a system in which theaction part of each constraint checking rule (which is executed when the constraint con-dition is false) contains update commands to restore the validity of the constraint. Thisproposal was implemented by Diaz [32] in the context of the ADAM OODB [86], and

CHAPTER 6. CONCLUSIONS AND FUTURE DIRECTIONS 171more recently a more complete implementation, in which repair actions are generatedsolely from the constraint speci�cations (i.e. without the help of any \hints" from theconstraint designer), was developed for the Context system [106].We would like to implement a similar system for P/FDM, but using constraintprogramming techniques, rather than pre-compiled active rules. From the point of viewof code generation, active rules are very convenient for stating repair actions since thecompiler can concentrate on deciding what is required to recover from the violationof one particular integrity constraint by one particular update event. However, thisproperty also makes them unsuitable for use with P/FDM for two reasons. The �rstreason is that the system is restricted to one �xed repair action, selected at compile-time, per constraint per update. We feel that this is unreasonable in the context ofdesign applications such as those for which P/FDM is designed, and we would prefer arepair mechanism which decides on the actions to take at commit-time, based upon thecurrent context and the user's preferences. Secondly, in the ECA-rule approach, repairactions are chosen on the basis of individual update events (i.e. the update event thattriggers the constraint rule), which may cause problems when a single violation is causedby more than one event. We feel that more acceptable repair actions can be discoveredby considering the set of violated constraints as a whole, rather than individually, sothat a minimal (or near minimal) set of repair updates can be chosen.In order to �nd sets of possible repair actions in such a system, it would benecessary to generate a CHIP program on the y at commit-time, which would searchfor a sequence of database updates that will resatisfy the violated constraints on thedata. However, this is almost exactly the approach that has been suggested above, forhandling non-deterministic updates using CHIP, which in turn suggests that we couldmake use of our extended Daplex language, as an intermediate language in order tosimplify the code generation process at commit-time.In the context of the less-traditional application areas, such as the protein struc-ture database, which involve complex updates that may violate many of the semanticintegrity constraints, a repair mechanism of this kind becomes an important design tool,which suggests possible solutions to problems but which does not dictate any individualrepair strategy to the user. The complex search process required to resatisfy a set ofseveral violated constraints is, in theory, exactly the kind of task that should be un-

CHAPTER 6. CONCLUSIONS AND FUTURE DIRECTIONS 172dertaken by the DBMS, rather than being left to the user. With recent advances inconstraint programming techniques, and building on the database features described inthis thesis, we believe that a DBMS which is capable of performing complex constraintmanipulation of this kind, on behalf of the user, is also possible in practice.

Appendix AData De�nition and ModuleManagement PrimitivesA.1 Primitives that operate on metadatanew entity class(+EDesc). Given the descriptor of an entity class, this primitive de-�nes that class within the module speci�ed in the descriptor.new function(+FDesc). Given the descriptor of a function, this primitive de�nes thatfunction in the module speci�ed in the descriptor. If the function is a method,then the code which implements it must be de�ned separately, by the user.new action(+ADesc). Given the descriptor of an action, this primitive de�nes the ac-tion in the module speci�ed in the descriptor. The code de�ning the behaviour ofthe method must be de�ned separately by the user.new module(+MDesc). Given the descriptor of a module, this primitive de�nes thatmodule and creates any necessary �les. The primitive exits with the module openin write mode.delete entity class(+ClassName). This primitive removes the entity class with thename ClassName, and its instances, from its module of de�nition. It also deletesany functions de�ned on or returning the given class and any mappings de�ningthese functions within the database.173

APPENDIX A. DATA DEFINITION AND MODULE MANAGEMENT PRIMITIVES 174function delete(+FName, +ArgType(s), +Res). Given the name of a function, itsargument type(s), and result type, this primitive deletes that function from itsmodule of de�nition. If the function is stored then, it also deletes any mappingsfor the function that exist within that module. If the function is a method function,then all associated code is also retracted from the metadata. Key functions maynot be deleted by this primitive.action delete(+AName, +ArgType(s)). Given the name of an action, its argumenttype(s), this primitive deletes that action from its module of de�nition, and re-tracts its procedural de�nition from the metadata.A.2 Primitives that operate on modulesset directory(+ModuleName, +Directory). This primitive allows the user to specifythe directory in which the given module's �les are located. The default is thecurrent directory.open module(+ModuleName, +Mode). This primitive opens an existing module in thegiven mode (one of read or write).close module(ModuleName). This primitive closes the module and removes its descrip-tors and associated method code from the metadata. If there have been anychanges to the module's metadata, then these will be copied to the module'sstored metadata.

Appendix BDaplex Syntax<command> ::= <declarations> j <definition> j <imperative><declarations> ::= create <module status> module <mname> [<decs>][consult methods file <file list>]j extend module <mname> [<decs>][consult methods file <file list>]j check sybase module <mname><module status> ::= shared j private j temporary<decs> ::= <entity dec> [<decs>]j <function dec> [<decs>]j <action dec> [<decs>]j <key dec> [<decs>]<entity dec> ::= declare <typeid> ->> <typeid>j declare <typeid> ->> entity<function dec> ::= declare <fname> ( [<type list>] ) <arrow> <typeid>j define <fname> ( [<type list>] ) <arrow> <typeid><arrow> ::= -> j ->><action dec> ::= define <aname> ( <type list> )<type list> ::= <typeid> [, <type list>]j set of <typeid><file list> ::= <filename> [, <file list>]<key dec> ::= key of <typeid> is <key components>175

APPENDIX B. DAPLEX SYNTAX 176<key components> ::= <key component> [, <key components>]<key component> ::= key of ( <fname> ) j <fname><definition> ::= <program def> j <function def> j <action def><program def> ::= program <pname> is <imperative><function def> ::= define <fname> ( [<define args>] ) -><typeid> in <mname> <singleton>j define <fname> ( [<define args>] ) ->><typeid> in <mname> <set expr><action def> ::= define <aname> ( [<define args>] ) in <mname>[ensuring <predicate>] <imperative><define args> ::= [set of] <varid> in <typeid> [, <define args>]<imperative> ::= [<loops>] <action part><loops> ::= <for loop> [<loops>]<for loop> ::= for each <named set expr> j for <singlevar><action part> ::= <action statement> [, <action part>]<action statement>::= <print statement> j <update statement> j <action><print statement> ::= print ( <expr list> )<update statement>::= let <svfncall> = <singleton>j let <mvfncall> = <expr>j include <set expr> into <fcall>j exclude <set expr> from <fcall>j delete <expr> jj create a new <typeid> with key = ( <key values> )<fcall> ::= <svfncall> j <mvfncall> j <typeid><key values> ::= <key value> [, <key values>]<key value> ::= key of <singleton> j <singleton><action> ::= <aname> ( <expr list> )j <aname> ( <set expr> )<expr list> ::= <expr> [, <expr list>]

APPENDIX B. DAPLEX SYNTAX 177<expr> ::= <set expr> j <singleton><named set expr> ::= <varid> in <set> [such that <predicate>][as <typeid>]<set expr> ::= <set> [as <typeid>]<set> ::= <typeid>j <mvfncall>j if <predicate> then <set> else <set>j ( <set expr> <set operation> <set expr> ) jj f <explicit component> gj <varid> in <set expr> such that <predicate><set operation> ::= union j intersection j difference<explicit component>::= <singleton> [to <singleton>][, <explicit component>]<singleton list> ::= <singleton> [, <singleton list>]<singleton> ::= <const>j <varid> [as <typeid>]j <svfncall>j <singlevar>j <arith expr>j <aggcall>j ( <singleton> )j <conditional expression><singlevar> ::= the <named set expr><arith expr> ::= [+ j -] <arith term><arith term> ::= <arith fac>j <arith term> + <arith fac>j <arith term> - <arith fac><arith fac> ::= <singleton>j <arith fac> * <singleton>j <arith fac> / <singleton><aggcall> ::= <summing op> ( over <named set expr> of <singleton> )j <aggregate op> ( <set expr> )<summing op> ::= total j average

APPENDIX B. DAPLEX SYNTAX 178<aggregate op> ::= maximum j minimum j count<conditional expression>::= if <predicate> then <singleton> else <singleton><predicate> ::= <bool term> j <predicate> or <bool term><bool term> ::= <bool fac> j <bool term> and <bool fac><bool fac> ::= [not] <bool prim><bool prim> ::= <comparison>j <quantified expr>j <set membership>j <subclass membership>j ( <predicate> )<comparison> ::= <arith expr> <comparison operator> <arith expr><comparison operator>::= = j < j > j <> j =< j >=<quantified expr> ::= <quantifier> <named set expr> <to have> <predicate><quantifier> ::= all j some j any j noj exactly <const> j at <range quantifier> <const><range quantifier>::= least j most<to have> ::= has j have<set membership> ::= <singleton> in <set expr><subclass membership>::= <singleton> is a <typeid><mvfncall> ::= <fname> ( [<expr list>] )<svfncall> ::= <fname> ( [<singleton list>] )where<aname> is the name of an action,<const> is an integer, oat of string constant,<filename> is the name of a �le,<fname> is the name of a function,

APPENDIX B. DAPLEX SYNTAX 179<mname> is the name of a module,<pname> is the name of a program,<scalar> is a scalar literal,<typeid> is the name of a type, and<varid> is the name of a variable.

Appendix CDaplex De�nition of theMetadata Schemacreate temporary module metadatadeclare modmeta ->> entitydeclare mname(modmeta) -> stringdeclare mstatus(modmeta) -> stringdeclare mtype(modmeta) -> stringdeclare mmode(modmeta) -> stringkey of modmeta is mnamedeclare objmeta ->> entitydeclare oname(objmeta) -> stringkey of objmeta is onamedeclare simplemeta ->> objmetadeclare compoundmeta ->> objmetadeclare cmodule(compoundmeta) -> modmetadeclare entmeta ->> compoundmetadeclare supertype(entmeta) -> entmetadeclare num inst(entmeta) -> integerdeclare valentmeta ->> compoundmetadeclare funmeta ->> entitydeclare fname(funmeta) -> stringdeclare first fun arg(funmeta) -> objmeta180

APPENDIX C. DAPLEX DEFINITION OF THE METADATA SCHEMA 181declare fun args(funmeta) ->> objmetadeclare card(funmeta) -> stringdeclare fstatus(funmeta) -> stringdeclare has inverse(funmeta) -> stringdeclare result type(funmeta) -> objmetadeclare fmodule(funmeta) -> modmetakey of funmeta is fname, key of(first fun arg)declare actmeta ->> entitydeclare aname(actmeta) -> stringdeclare first act arg(actmeta) -> objmetadeclare act args(actmeta) ->> objmetadeclare amodule(actmeta) ->> modmetakey of actmeta is aname, key of(first act arg)declare constraintmeta ->> entitydeclare constraint name(constraintmeta) -> stringdeclare constraint type(constraintmeta) -> stringdeclare enabled(constraintmeta) -> stringdeclare constraint text(constraintmeta) -> stringdeclare operators(constraintmeta) ->> stringdeclare constraint module(constraintmeta) -> modmetakey of constraintmeta is constraint namedeclare key component(entmeta) ->> funmetadeclare constraints on class(entmeta) ->> constraintmetadeclare constraints on function(funmeta) ->> constraintmeta;define subtype(e in entmeta) ->> entmeta in metadatas in entmeta such that e in supertype(s);define subtypes(e in entmeta) ->> entmeta in metadata(subtype(e) union subtypes(subtype(e)));define supertypes(e in entmeta) ->> entmeta in metadata(supertype(e) union supertypes(supertype(e)));define functions on(o in objmeta) ->> funmeta in metadataf in funmeta such that o in fun args(f);define functions on(e in entmeta) ->> funmeta in metadata(functions on(supertype(e))union

APPENDIX C. DAPLEX DEFINITION OF THE METADATA SCHEMA 182f in funmeta such that e in fun args(f) as entmeta);define functions yielding(o in objmeta) ->> funmeta in metadataf in funmeta such that result type(f) = o;define num f args(f in funmeta) -> integer in metadatacount(fun args(f));define actions on(o in objmeta) ->> actmeta in metadataa in actmeta such that o in act args(a);define actions on(e in entmeta) ->> actmeta in metadata(actions on(supertype(e))uniona in actmeta such that e in act args(a) as entmeta);define num a args(a in actmeta) -> integer in metadatacount(act args(a));

Appendix DProlog Solution to the MusicLesson Allocation ProblemBelow we give the Prolog code as generated for the nd allocate program. In orderto improve the clarity of the code, we have added some comments, and have given thevariables more meaningful names. Apart from these changes, and the removal of somedetails of the handling of sets of loop variable bindings, this is the code as currentlyproduced by our Daplex compiler.% Top level routine implementing first loop and actionsnd allocate :-( initialise backtrackable module([edescj(music lesson, ,, , , , musicdb)]),% First loop - build a list of all students requiring% music lessonsfindall(Student, (getentity(student, Student),atmost(0, (getentity(music lesson, Lesson),getfnval(pupil, [Lesson], Student)))), Students),% Call the routine implementing the second ``for loop''% to allocate the lessonsnd allocate1(Students),183

APPENDIX D. PROLOG SOLUTION TO THE MUSIC LESSON ALLOCATION PROBLEM 184commit backtrackable updates,% End of loops, start of actions( % Print results of programget loop variable values([Student, Lesson]),getfnval(teacher, [Lesson], Teacher),getfnval(name, [Student], SName),getfnval(name, [Teacher], TName),getfnval(day, [Lesson], Day),write list(['Student', SName, 'allocated to',TName, Day]),fail; true); close module(backtrackable),fail).% Innermost routine implementing the nested "for a new" loopnd allocate1([]).nd allocate1([Student | RemainingStudents]) :-% Select a value for the "teacher" attributegetentity(music teacher, Teacher),getfnval(instrument, [Student], Instrument),getfnval(instruments, [Teacher], Instrument),% Select a value for the "day" attributegetfnval(freedays, [Teacher], TDay),% We do not need to select a value for the "pupil" attribute, as% this will be given by the input parameter list (i.e. Student).% Build the key of the new music lesson instance.derive key(music teacher, Teacher, TeachersKey),append(TeachersKey, [TDay], Key),% And use a backtrackable update to create it.newentity(music lesson, Key, Lesson),% Define the pupil attribute (also using a backtrackable update)addfnval(pupil, [Lesson], Student),

APPENDIX D. PROLOG SOLUTION TO THE MUSIC LESSON ALLOCATION PROBLEM 185% Now check the constraint that the pupil of the lesson must be% free on the chosen day.getfnval(day, [Lesson], Day),getfnval(pupil, [Lesson], Pupil),getfnval(freedays, [Pupil], Day),% Remember the bindings of the loop variable values (the% disjunction is to ensure that invalid bindings are% "forgotten" if/when we backtrack into this routine.( assert loop variable values([Student, Lesson]); retract(loop variable values([Student, Lesson])),fail),% Finally, continue allocating lessons for the remaining% students.nd allocate1(RemainingStudents).

Appendix EThe Music Lesson DatabaseE.1 The Music Lesson Database Schemacreate private module musicdbdeclare person ->> entitydeclare name(person) -> stringkey of person is namedeclare student ->> persondeclare instrument(student) -> stringdeclare music teacher ->> persondeclare instruments(music teacher) ->> stringdeclare hobby ->> entitydeclare day(hobby) -> stringdeclare type(hobby) -> stringdeclare done by(hobby) -> personkey of hobby is key of(done by), daydeclare music lesson ->> entitydeclare day(music lesson) -> stringdeclare teacher(music lesson) -> music teacherdeclare pupil(music lesson) -> studentkey of music lesson is key of(teacher), day;define busy(p in person) ->> string in musicdbif p is a music teacher thenday(teacher inv(p as music teacher))elseday(done by inv(p)); 186

APPENDIX E. THE MUSIC LESSON DATABASE 187define freedays(p in person) ->> string in musicdb(f"monday", "tuesday", "wednesday", "thursday", "friday"gdifference busy(p));E.2 Contents of the Music Lesson DatabaseThe following table illustrates the contents of the music lesson database, as used forevaluating the e�ects of promoting the \no lessons on Wednesdays" constraint into thebody of the generated code.Name Instrument(s) Free DaysMon Tue Wed Thu FriMr. Vivaldi Violin, Viola, Cello,HarpsichordMrs. Mozart Piano, Harpsichord,VoiceMiss Holst Horn, Trombone, Tuba,PianoJ.S.B. Harpsichord Busy BusyC.P.E.B. Harpsichord Busy Busy BusyW.A.M. Piano Busy Busy BusyL.V.B. Violin BusyG.F.H. Violin BusyP.I.T. Cello BusyG.T.H. Horn BusyM.P.M. Piano BusyA.P.P. Trombone BusyM.C. Voice BusyK.t.K. Voice BusyL.P. Voice Busy

Bibliography[1] J. Annevelink. Database Programming Languages: a Functional Approach. InJ. Cli�ord and R. King, editors, SIGMOD 91 Conference, pages 318{327, Denver,Colorado, May 1991. ACM Press.[2] G. Argo, J. Hughes, P. Trinder, J. Fairbairn, and J. Launchbury. ImplementingFunctional Databases. In Bancilhon and Buneman [3], chapter 10, pages 165{176.[3] F. Bancilhon and P. Buneman, editors. Advances in Database Programming Lan-guages. Frontier Series. ACM Press, 1990.[4] N.S. Barghouti and G.E. Kaiser. Concurrency Control in Adavnced DatabaseApplications. ACM Computing Surveys, 23(3):269{318, September 1991.[5] N. Bassiliades. Constraint Description in ADAM. Msc. thesis, Dept. of ComputingScience, University of Aberdeen, Aberdeen, Scotland, AB9 2UE, 1993.[6] D.S. Batory, T.Y. Leung, and T.E. Wise. Implementation Concepts for an Exten-sible Data Model and Data Language. ACM Transactions on Database Systems,13(3):231{262, September 1988.[7] C. Bauzer Medeiros and P. Pfe�er. A Mechanism for Managing Rules in anObject-Oriented Database. Technical report, Altair, 1990.[8] F.C. Bernstein, T.F. Koetzle, G.J.B. Williams, E.F. Mayer, M.D. Bryce, J.R.Rodgers, O. Kennard, T. Shimanouchi, and M. Tasumi. The Protein Data Bank:a Computer-based Archival File for Macromolecular Structures. Journal of Molec-ular Biology, 112:535{542, 1987.[9] E. Bertino and D. Musto. Correctness of Semantic Integrity Checking in DatabaseManagement Systems. Acta Informatica, 26:25{57, 1988.188

BIBLIOGRAPHY 189[10] R. Bird and P. Wadler. Introduction to Functional Programming. Prentice HallSeries in Computer Science. Prentice Hall International Ltd., 1988.[11] O. Boucelma and J. Le Maitre. An Extensible Functional Query Language for anObject-Oriented Database System. In Delobel et al. [31], pages 567{581.[12] M.L. Brodie. On the Development of Data Models. In Brodie et al. [13], chapter 2,pages 19{48.[13] M.L. Brodie, J. Mylopoulos, and J.W. Schmidt, editors. On Conceptual Modelling.Topics in Information Systems. Springer-Verlag, 1984.[14] F. Bry, R. Manthey, and B. Martens. Integrity Veri�cation in Knowledge Bases. InA. Voronkov, editor, Logic Programming: Proc. of the First and Second RussianConference on Logic Programming, pages 114{139. Springer-Verlag, 1992.[15] A.P. Buchmann, R.S. Carrera, and M.A. Vazquez-Galindo. A Generalised Con-straint and Exception Handler for an Object-Oriented CAD-DBMS. In Dittrichand Dayal [35], pages 38{49.[16] P. Buneman and R.E. Frankel. An Implementation Technique for Database QueryLanguages. ACM Transactions on Database Systems, 7(2):164{186, June 1982.[17] P. Buneman and R. Nikhil. The Functional DataModel and its Uses for Interactionwith Databases. In Brodie et al. [13], chapter 13, pages 359{380.[18] F. Cacace, S. Ceri, S. Crespi-Reghizzi, L. Tanca, and R. Zicari. IntegratingObject-Oriented Data Modelling with a Rule-Based Programming Paradigm. InH. Garcia-Molina and H.V. Jagadish, editors, SIGMOD 90 Conference, pages225{236, Atlantic City, May 1990. ACM Press.[19] F. Cacace, S. Ceri, and L. Tanca. Consistency and Non-Determinism in a DatabaseProgramming Language. In Proceedings of the 3rd Symposium on MathematicalFundamentals of Database and Knowledge Base Systems (MFDBS-91), pages 325{341, Rostock, Germany, 1991.[20] D. Can�eld Smith and H.J. Enea. Backtracking in MLISP2: an E�cient Back-tracking Method for LISP. In N.J. Nilsson, editor, Proc. 3rd IJCAI, pages 677{685,Stanford, California, August 1973.

BIBLIOGRAPHY 190[21] S. Ceri and J. Widom. Deriving Production Rules for Constraint Maintenance.In R. Sacks-Davis D. McLeod and H. Schek, editors, VLDB 90 Conference, pages566{577, Brisbane, 1990. Morgan Kaufmann Publishers, Inc.[22] C.L. Chang. DEDUCE 2: Further Investigations of Deduction in Relational DataBases. In H. Gallaire and J. Minker, editors, Logic and Databases, pages 201{236.Plenum Press, 1978.[23] W. Chen. Declarative Speci�cation and Evaluation of Database Updates. InDelobel et al. [31], pages 147{166.[24] D.A. Clark, J. Shirazi, and C.J. Rawlings. Protein Topology Prediction ThroughConstraint-Based Search and the Evaluation of Topological Folding Rules. ProteinEngineering, 4(7):751{760, 1991.[25] J. Cohen and E. Carton. Non-Deterministic Fortran. The Computer Journal,17(1):44{51, 1974.[26] T.E. Creighton. Proteins: Structures and Molecular Properties. W.H. Freemanand Company, second edition, 1993.[27] C.J. Date. Referential Integrity. In C. Zaniolo and C. Delobel, editors, VLDB 81Conference, pages 2{12, Cannes, 1981. IEEE Computer Society Press.[28] U. Dayal and H.-Y. Hwang. View De�nition and Generalization for Database Inte-gration in a Multidatabase System. IEEE Transactions on Software Engineering,10(6):628{645, November 1984.[29] U. Dayal, G. Schlageter, and L. H. Seng, editors. VLDB 84 Conference, Singapore,1984.[30] S.M. Deen. The ANSI/SPARC Architecture and its Implementation in PRECI.In P.M. Stocker, P.M.D. Gray, and M.P. Atkinson, editors, Databases | Role andStructure, pages 93{104. Cambridge University Press, 1984.[31] C. Delobel, M. Kifer, and Y. Masunaga, editors. Second International Conferenceon Deductive and Object-Oriented Databases, Munich, December 1991. Springer-Verlag.

BIBLIOGRAPHY 191[32] O. D�iaz. Deriving Rules for Constraint Maintenance in an Object-OrientedDatabase. In A.M. Toja and I. Ramos, editors, DEXA 92 Conference, pages332{337. Springer-Verlag, 1992.[33] O. D�iaz and S.M. Embury. Generating Active Rules from High-Level Speci�-cations. In P.M.D. Gray and R.J. Lucas, editors, Advanced Database Systems:Proceedings of 10th British National Conference on Databases (LNCS 618), pages227{243, Aberdeen, Scotland, July 1992. Springer Verlag.[34] O. D�iaz, N.W. Paton, and P.M.D. Gray. Rule Management in Object-OrientedDatabases: a Uniform Approach. In Lohman et al. [73], pages 317{326.[35] K. Dittrich and U. Dayal, editors. International Workshop on Object-OrientedDatabase Systems, Paci�c Grove, September 1986. IEEE Computer Society Press.[36] A.K. Elmagarmid, editor. Database Transaction Models for Advanced Applica-tions. Morgan Kaufmann Series in Data Management Systems. Morgan KaufmannPublishers, Inc., 1992.[37] S.M. Embury, P.M.D. Gray, and N.D. Bassiliades. Constraint Maintenance UsingGenerated Methods in the P/FDMObject-Oriented Database. In N.W. Paton andM.H. Williams, editors, Proc. of 1st Int. Workshop on Rules in Database Systems(RIDS '93), Workshops in Computing, pages 364{381, Edinburgh, August 1993.Springer-Verlag.[38] S.M. Embury, Z. Jiao, and P.M.D. Gray. Using Prolog to Provide Access toMetadata in an Object-Oriented Database. In Moss [79]. 17 pages.[39] M. Erwig and U.W. Lipeck. A Functional DBPL Revealing High Level Opti-mizations. In P. Kanellakis and J.W. Schmidt, editors, Proceedings of the 3rdInternational Workshop on Database Programming Languages: Bulk Types andPersistent Data, pages 306{321, Nafplion, Greece, August 1991. Morgan Kauf-mann Publishers, Inc.[40] H. Decker F. Bry and R. Manthey. A Uniform Approach to Constraint Satisfactionand Constraint Satis�ability in Deductive Databases. In J.W. Schmidt, S. Ceri,and M. Missiko�, editors, Advances in Database Technology - EDBT '88, pages488{505. Springer Verlag, 1988.

BIBLIOGRAPHY 192[41] R.E. Fikes. REF-ARF: a System for Solving Problems Stated as Procedures.Arti�cial Intelligence, 1:27{120, 1970.[42] R.W. Floyd. Nondeterministic Algorithms. Journal of the ACM, 14(4):636{644,October 1967.[43] H. Gallaire, J. Minker, and J.-M. Nicolas. Logic and Databases: a DeductiveApproach. ACM Computing Surveys, 16(2):153{185, June 1984.[44] J. Garza and W. Kim. Transaction Management in an Object-Oriented DatabaseSystem. In H. Boral and P.-A. Larson, editors, SIGMOD 88 Conference, pages37{45, Chicago, Illinois, June 1988. ACM Press.[45] N.H. Gehani and H.V. Jagadish. Ode as an Active Database: Constraints andTriggers. In Lohman et al. [73], pages 327{336.[46] F. Giannotti, D. Pedreschi, D. Sacc�a, and C. Zaniolo. Non-Determinism in De-ductive Databases. In Delobel et al. [31], pages 129{146.[47] G.D. Gibbons. POPS: an Application of Heuristic Search Methods to the Process-ing of a Non-Deterministic Programming Language. In Proc. 3rd IJCAI, pages589{600, 1973.[48] S.W. Golomb and L.D. Baumert. Backtrack Programming. Journal of the ACM,12(4):516{524, October 1965.[49] P.M.D. Gray. Logic, Algebra and Databases. Ellis Horwood Series in Computersand Their Applications. Ellis Horwood Ltd., 1984.[50] P.M.D. Gray, K.G. Kulkarni, and N.W. Paton. Object-Oriented Databases: a Se-mantic Data Model Approach. Prentice Hall Series in Computer Science. PrenticeHall International Ltd., 1992.[51] P.M.D. Gray, N.W. Paton, G.J.L. Kemp, and J.E. Fothergill. An Object-OrientedDatabase for Protein Structure Analysis. Protein Engineering, 3:235{243, 1990.[52] H.W. Guesgen and J. Hertzberg. A Perspective of Constraint-Based Reasoning.Lecture Notes in Arti�cial Intelligence. Springer-Verlag, 1992.

BIBLIOGRAPHY 193[53] P. Van Hentenryck. Constraint Satisfaction in Logic Programming. MIT Press,1989.[54] A. Hsu and T. Imielinski. Integrity Checking for Multiple Updates. In S. Navathe,editor, SIGMOD 85 Conference, pages 152{168, Austin, Texas, May 1985. ACMPress.[55] R. Hull and R. King. Semantic Data Modelling: Survey, Applications and Re-search Issues. ACM Computing Surveys, 19(3):201{260, September 1987.[56] H.V. Jagadish and X. Qian. Integrity Maintenance in an Object-OrientedDatabase. In Yuan [109], pages 469{480.[57] K.G. Je�rey, J. Lay, and T. Curtis. Logic Programming and Database Technologyused for Validation within Transactions. In M.H. Williams, editor, Proceedings of7th British National Conference on Databases, British Computer Society Work-shop Series, pages 71{84, Heriot-Watt University, Edinburgh, July 1989. Cam-bridge University Press.[58] Z. Jiao. Modules and Temporary Data in P/FDM. Technical ReportAUCS/TR9016, Dept. of Computing Science, University of Aberdeen, Aberdeen,Scotland, AB2 9UE, December 1990.[59] Z. Jiao. Optimisation Studies in a Prolog Object-Oriented Database. PhD thesis,University of Aberdeen, Aberdeen, Scotland, November 1992.[60] Z. Jiao and P.M.D. Gray. Optimisation of Methods in a Navigational QueryLanguage. In Delobel et al. [31], pages 22{42.[61] Z. Jiao and P.M.D. Gray. Using Prolog to Transform and Optimise Queries ina Large Protein Database. Technical Report AUCS/TR9115, University of Ab-erdeen, Aberdeen, Scotland, AB2 9UE, 1991.[62] G.E. Kaiser and C. Pu. Dynamic Restructuring of Transactions. In Elmagarmid[36], chapter 8, pages 265{295.[63] R.H. Katz. Computer-Aided Design Databases. In G. Ariav and J. Cli�ord,editors, New Directions for Database Systems, pages 110{123. Ablex PublishingCorp., 1986.

BIBLIOGRAPHY 194[64] G.J.L. Kemp. Protein Modelling: a Design Application of an Object-OrientedDatabase. In J. Gero, editor, Proc. 1st Int. Conf. on Arti�cial Intelligence inDesign, pages 387{406. Butterworth-Heinemann, 1991.[65] G.J.L. Kemp. Protein Modelling Using an Object-Oriented Database. PhD thesis,University of Aberdeen, Aberdeen, Scotland, June 1991.[66] S.N. Khosha�an and G.P. Copeland. Object Identity. In OOPSLA '86 Proceed-ings, pages 406{416, September 1986.[67] J.J. King. Query Optimisation by Semantic Reasoning. UMI Research Press,1984.[68] K.G. Kulkarni and M.P. Atkinson. EFDM: Extended Functional Data Model.The Computer Journal, 29(1):38{46, 1986.[69] P.J. Landin. A Lambda Calculus Approach. In L. Fox, editor, Advances in Pro-gramming and Non-Numerical Computation, Symposium Publications Division,chapter 5, pages 97{141. Pergamon Press, 1966.[70] R. Laskowski, M.W. MacArthur, D.S. Moss, and J.M. Thornton. PROCHECK:a Program to Check the Stereochemical Quality of Protein Structures. J. Appl.Cryst., 26:283{291, 1993.[71] S. Leishman. Automatic Structural Assignment of Protein 2D Nuclear MagneticResonance Spectra Using Constraint Logic Programming. Phd. thesis proposal,University of Aberdeen, Aberdeen, Scotland, 1994.[72] U.W. Lipeck. Transformation of Dynamic Integrity Constraints into TransactionSpeci�cations. In M. Gyssens, J. Paredaens, and D. Van Gucht, editors, 2nd Int.Conference on Database Theory, pages 322{337, Bruges, August 1988. Springer-Verlag.[73] G. Lohman, A. Sernadas, and R. Camps, editors. VLDB 91 Conference,Barcelona, 1991. Morgan Kaufmann Publishers, Inc.[74] A. Mackworth. Constraint Satisfaction. In S.C. Shapiro, editor, The Encyclopaediaof Arti�cial Intelligence, volume 1, pages 285{293. John Wiley and Sons, Inc.,second edition, 1992.

BIBLIOGRAPHY 195[75] S. Manchanda and D.S. Warren. A Logic-Based Language for Database Updates.In J. Minker, editor, Foundations of Deductive Databases and Logic Programming,pages 363{394. Morgan Kaufmann Publishers, Inc., 1987.[76] F. Manola and U. Dayal. PDM: an Object-Oriented Data Model. In Dittrich andDayal [35], pages 18{25.[77] C. Montangero, G. Pacini, and F. Turini. Two-Level Control Structure for Nonde-terministic Programming. Communications of the ACM, 20(10):725{730, October1977.[78] M. Morgenstern. Constraint Equations | Declarative Expression of Constraintswith Automatic Enforcement. In Dayal et al. [29], pages 291{300.[79] C. Moss, editor. International Conference on the Practical Application of Prolog,London, April 1992. Applied Workstations, Ltd.[80] S. Naqvi and R. Krishnamurthy. Semantics of Update in Logic Programming. InBancilhon and Buneman [3], chapter 19, pages 313{328.[81] S. Naqvi and S. Tsur. A Logical Language for Data and Knowledge Bases. Com-puter Science Press, 1989.[82] A. Newell and H.A. Simon. GPS: a Program that Simulates Human Thought. InE.A. Feigenbaum and J. Feldman, editors, Computers and Thought, pages 279{298. McGraw-Hill Book Company, 1963.[83] J.-M. Nicolas. Logic for Improving Integrity Checking in Relational Databases.Acta Informatica, 18:227{253, 1982.[84] R.S. Nikhil. The Semantics of Update in a Functional Database ProgrammingLanguage. In Bancilhon and Buneman [3], chapter 24, pages 403{421.[85] M.H. Nodine, S. Ramaswamy, and S.B. Zdonik. A Cooperative Transaction Modelfor Design Databases. In Elmagarmid [36], chapter 3, pages 53{86.[86] N.W. Paton and O. D�iaz. Metaclasses in Object-Oriented Databases. In R.A.Meersman and W. Kent, editors, Proc. IFIP TC2 Conf. on Database Semantics:Object-Oriented Databases (DS-4). North-Holland, 1990.

BIBLIOGRAPHY 196[87] N.W. Paton and P.M.D. Gray. Identi�cation of Database Objects by Key. InK. Dittrich, editor, Advances in Object-Oriented Databases (Proc. Object-OrientedDatabase System-II). Spring-Verlag, 1988.[88] N.W. Paton, S. Leishman, S.M. Embury, and P.M.D. Gray. On Using Prologto Implement Object-Oriented Databases. Information and Software Technology,35(5):301{311, May 1993.[89] L. Pauling. The Nature of the Chemical Bond. Cornell University Press, secondedition, 1949.[90] S.L. Peyton Jones. The Implementation of Functional Programming Languages.Prentice Hall Series in Computer Science. Prentice Hall International Ltd., 1987.[91] A. Poulovassilis and P. King. Extending the Functional Data Model to Compu-tational Completeness. In C. Thanos F. Bancilhon and D. Tsichritzis, editors,EDBT 90 Conference, pages 75{91, Venice, March 1990. Springer Verlag.[92] A. Poulovassilis and C. Small. A Functional Programming Approach to DeductiveDatabases. In Lohman et al. [73], pages 491{500.[93] X. Qian. The Deductive Synthesis of Database Transactions. ACM Transactionson Database Systems, 18(4):626{677, December 1993.[94] Quintus Corporation, Palo Alto, California. Quintus Prolog Language and LibraryManual, 1991.[95] R. Ramakrishnan, D. Srivastava, and S. Sudarshan. CORAL: Control, Relationsand Logic. In Yuan [109], pages 238{250.[96] U.S. Reddy. On the Relationship between Logic and Functional Languages. InD. DeGroot and G. Lindstrom, editors, Logic Programming: Functions, Relationsand Equations, pages 3{36. Prentice Hall, 1986.[97] T. Sheard and D. Stemple. Automatic Veri�cation of Database Transaction Safety.ACM Transactions on Database Systems, 14(3):322{368, September 1989.[98] D.W. Shipman. The Functional Data Model and the Data Language DAPLEX.ACM Transactions on Database Systems, 6(1):140{173, March 1981.

BIBLIOGRAPHY 197[99] A. Shoshani, F. Olken, and H.K.T. Wong. Characteristics of Scienti�c Databases.In Dayal et al. [29], pages 147{160.[100] E.H. Sibley and L. Kerschberg. Data Architecture and DataModel Considerations.In R.R. Korfhage, editor, Proceedings of AFIPS National Computer Conference,pages 85{96, Dallas, Texas, June 1977. AFIPS Press.[101] J.M. Siskind and D.A. McAllester. Screamer: a Portable E�cient Implementationof Nondeterministic Common Lisp. Technical Report IRCS-93-03, University ofPensylvania Institute for Research in Cognitive Science, 1993.[102] M. Ste�k and D.G. Bobrow. Object-Oriented Programming: Themes and Varia-tions. The AI Magazine, 6(4):40{82, 1986.[103] Swedish Institute of Computer Science. SICStus Prolog Users' Manual, 1993.[104] D.A. Turner. Miranda: a Non-Strict Functional Language with PolymorphicTypes. In J.-P. Jouannaud, editor, Proceedings of the IFIP Int. Conf. onFunctional Programming Languages and Computer Architecture, Lecture Notesin Computing Science, Vol. 201, pages 1{16, Nancy, France, September 1985.Springer-Verlag.[105] S.D. Urban. ALICE: an Assertion Language for Integrity Constraint Expression.In Proceedings of Conference on Computer Software Applications, September 1989.[106] S.D. Urban and M. Desiderio. CONTEXT: a CONstraint EXplanation Tool. Dataand Knowledge Engineering, 8:153{183, 1992.[107] S.D. Urban, A.P. Karadimce, and R.B. Nannapaneni. The Implementation andEvaluation of Integrity Maintenance Rules in an Object-Oriented Database. In 8thInternational Conference on Data Engineering, pages 565{572, Pheonix, Arizona,1992. IEEE.[108] X.Y. Wang, W.A. Gray, and N.J. Fiddian. KBTDA: a Knowledge-Based DatabaseTransaction Design Tool Implemented in Prolog. In Moss [79]. 23 pages.[109] L.-Y. Yuan, editor. VLDB 92 Conference, Vancouver, August 1992. Morgan Kauf-mann Publishers, Inc.

BIBLIOGRAPHY 198[110] R. Zabih, D. McAllester, and D. Chapman. Non-Deterministic Lisp withDependency-Directed Backtracking. In K. Forbus and H. Shrobe, editors, Proc.AAAI-87, pages 50{64, Seattle, Washington, 1987. AAAI Press.[111] S.B. Zdonik and D. Maier. Fundamentals of Object-Oriented Databases. In S.B.Zdonik and D. Maier, editors, Readings in Object-Oriented Database Systems,pages 1{32. Morgan Kaufmann Publishers, Inc., 1990.

Constraint-Based Updates in a Functional Data Model Database

Documents