Top Banner
Auditing Compliance with a Hippocratic Database Rakesh Agrawal Rakesh Agrawal Roberto Bayardo Roberto Bayardo Christos Faloutsos Christos Faloutsos Jerry Kiernan Jerry Kiernan Ralf Rantzau Ralf Rantzau Ramakrishnan Srikant Ramakrishnan Srikant Intelligent Information Systems Research Intelligent Information Systems Research IBM Almaden Research Center IBM Almaden Research Center
47

Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Dec 29, 2015

Download

Documents

Meredith Curtis
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Auditing Compliance with a Hippocratic

Database

Rakesh AgrawalRakesh AgrawalRoberto BayardoRoberto Bayardo

Christos FaloutsosChristos FaloutsosJerry KiernanJerry KiernanRalf RantzauRalf Rantzau

Ramakrishnan SrikantRamakrishnan Srikant

Intelligent Information Systems ResearchIntelligent Information Systems ResearchIBM Almaden Research CenterIBM Almaden Research Center

Page 2: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

OutlineOutline

Introduction and motivationIntroduction and motivation Problem statementProblem statement FoundationsFoundations System organization and System organization and

algorithmsalgorithms PerformancePerformance SummarySummary

Page 3: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

MotivationMotivation

Hippocratic databases advocate policy directed Hippocratic databases advocate policy directed data management for privacy sensitive datadata management for privacy sensitive data

– Need reinforced by legislations and regulations:Need reinforced by legislations and regulations: Health Insurance Portability & Accountability ActHealth Insurance Portability & Accountability Act Gramm-Leach Bliley Act – Consumer Privacy RuleGramm-Leach Bliley Act – Consumer Privacy Rule

GoalGoal– Build a system to assist with auditing compliance with Build a system to assist with auditing compliance with

the stated policythe stated policy Event driven - privacy complaintEvent driven - privacy complaint Periodic - monitor exposure to privacy violationPeriodic - monitor exposure to privacy violation

Page 4: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Audit ScenarioAudit Scenario

Jane complains to the department of Health and Human Services saying that she had opted out of the doctor sharing her medical information with pharmaceutical companies for marketing purposes

The doctor must now review disclosures of Jane’s information in order to understand the circumstances of the disclosure, and take appropriate action

Sometime later, Jane receives promotional literature from a pharmaceutical company, proposing over the counter diabetes tests

Jane has not been feeling well and decides to consult her doctor

The doctor uncovers that Jane’s blood sugar level is high and suspects diabetes

Page 5: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Audit ExpressionAudit Expression

audit T.disease

from Customer C, Treatment T

where C.cid=T.pcid and C.name = ‘Jane’

Who has accessed Jane’s disease information?

Page 6: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

OutlineOutline

Introduction and motivationIntroduction and motivation Problem statementProblem statement FoundationsFoundations System organization and System organization and

algorithmsalgorithms PerformancePerformance SummarySummary

Page 7: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Problem StatementProblem Statement

GivenGiven– A log of queries executed over a A log of queries executed over a

databasedatabase– An audit expression specifying An audit expression specifying

sensitive data sensitive data Precisely identifyPrecisely identify

– Those queries that accessed the data Those queries that accessed the data specified by the audit expressionspecified by the audit expression

Page 8: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

““Suspicious” QueriesSuspicious” Queries

cidcid namename addresaddresss

zipzip ……

11 JaneJane 1234 …1234 … 9512095120 ……

……

A query Qi has accessed information contained in the Customer table

The audit expression A specifies the data to the audited

If query Qi accesses all the cells specified by the audit expression A for any row, Qi is suspicious

Customer table

Page 9: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

IssuesIssues

Convenient languageConvenient language– Audit expression (essentially SPJ Audit expression (essentially SPJ

query)query) Fast and precise on auditsFast and precise on audits Non disruptive Non disruptive

– Minimal performance impact on Minimal performance impact on normal database operationnormal database operation

Fine grainedFine grained

Page 10: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

AssumptionsAssumptions

Disclosures stemming from multiple Disclosures stemming from multiple query executions is not consideredquery executions is not considered

No use of outside knowledge to No use of outside knowledge to deduce information without deduce information without detectiondetection

Queries considered include Queries considered include – Joins and aggregation, but not nested Joins and aggregation, but not nested

subqueriessubqueries Note that existential subqueries can be Note that existential subqueries can be

converted into joins [SIGMOD92]converted into joins [SIGMOD92]

Page 11: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

OutlineOutline

Introduction and motivationIntroduction and motivation Problem statementProblem statement FoundationsFoundations System organization and System organization and

algorithmsalgorithms PerformancePerformance SummarySummary

Page 12: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Informal DefinitionsInformal Definitions

““Candidate” queryCandidate” query– Logged query that accesses all columns Logged query that accesses all columns

specified by the audit expressionspecified by the audit expression ““Indispensable” tuple (for a query)Indispensable” tuple (for a query)

– A tuple whose omission makes a difference A tuple whose omission makes a difference to the result of a queryto the result of a query

““Suspicious” querySuspicious” query– A candidate query that shares an A candidate query that shares an

indispensable tuple with the audit indispensable tuple with the audit expressionexpression

Page 13: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Indispensable TupleIndispensable Tuple

))(())((

STARTQ

AOA

QOQ

PC

PC

The SPJ query Q and the audit expression A are of the form:

))}){((())((),( RvTRTQvind QQQQ PCPC

Definition 1 - A virtual tuple v T is indispensable for an SPJ query Q if the result of Q changes when we delete v:

Predicates in Q

Columns appearing anywhere in Q

Duplicate preserving projection operator

Tables common to Q and A

Output columns in Q

Page 14: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

““Candidate” QueryCandidate” Query

OAQ CC

Definition 6 - Q is a candidate query with respect to A if:

Only candidate queries can be suspicous queries

Page 15: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

““Suspicious” QuerySuspicious” Query

),(),( s.t. ),( AvindQvindTvAQsusp

Definition 7 - Q is suspicious with respect to A if they share an indispensable MVT v

For example,Query Q: Addresses of people with diabetesAudit A: Jane’s diagnosis

Jane’s tuple is indispensable for both; hence query Q is “suspicious” with respect to A

A tuple v is a MVT for queries Q1 and Q2 if it belongs to the cross product of common tables in their from clauses

Definition 5 - Maximal virtual tuple (MVT):

Page 16: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

OutlineOutline

Introduction and motivationIntroduction and motivation Problem statementProblem statement FoundationsFoundations System organization and System organization and

algorithmsalgorithms PerformancePerformance SummarySummary

Page 17: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

System OverviewSystem Overview

DataTables

IDID TimestamTimestampp

QueryQuery UserUser PurposePurpose RecipientRecipient

11 2004-02…2004-02… Select …Select … JamesJames CurrentCurrent OursOurs

22 2004-02…2004-02… Select …Select … JohnJohn TelemarketingTelemarketing publicpublic

Query Log

DatabaseLayer

Query with purpose, recipient

Updates, inserts, delete

Backlog

Database triggers track updates to base tables

Audit

DatabaseLayer

Audit expression

IDs of log queries having accessed data specified by the audit query

Audit query

Static analysis

Generate audit query

Page 18: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Static AnalysisStatic Analysis

IDID TimestamTimestampp

QueryQuery UserUser PurposePurpose RecipientRecipient

11 2004-02…2004-02… Select …Select … JamesJames CurrentCurrent OursOurs

22 2004-02…2004-02… Select …Select … JohnJohn TelemarketingTelemarketing publicpublic

Query Log

Audit expression

Filter Queries

Candidate queries

Eliminates queries that could not possibly have violated the audit expression

Insures that

Accomplished by examining only the queries themselves (i.e., without running the queries)

OAQ CC

Page 19: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Audit Query Audit Query GenerationGeneration GoalGoal

– Build a query which, when run, Build a query which, when run, returns the id’s of suspicious queries returns the id’s of suspicious queries with respect to an audit expression with respect to an audit expression AA

Page 20: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Generating the Audit Generating the Audit QueryQuery

Candidate Query

1

Candidate Query

2

Audit Expressio

n

Union

Combine individual candidate queries and the audit expression into a single query graph

Combine the audit expression with individual candidate queries to identify suspicious queries

Replace each table with it’s backlog to restore the version of the table to the time of each query

T1 T2

QGM is a graphical representation of a query

Boxes represent operators, such as select

Lines represent input/output relationships between

operators

Boxes with no inputs are tables

Page 21: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Suspicious SPJ QuerySuspicious SPJ Query

)(( SRTQA PP

))((

))((

STA

RTQ

AOA

QOQ

PC

PC

Theorem 2 - A candidate SPJ query Q is suspicious with respect to an audit expression A if and only if:

The candidate SPJ query Q and the audit expression A are of the form:

QGM rewrites, shown in previous slide, transform Q and A into:

)))((("" SRTQAi PPQ

Proof of correctness is based upon Definition 7 (suspicious query) and

given in the paper

Page 22: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Suspicious Aggregate Suspicious Aggregate Query (Including Query (Including Having)Having) Solution in the paper Solution in the paper

Page 23: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

ExampleExample

Jane’s audit

Page 24: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Audit ExpressionAudit Expression

audit T.disease

from Customer C, Treatment T

where C.cid=T.pcid and C.name = ‘Jane’

Who has accessed Jane’s disease information?

Page 25: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Query LogQuery Log

IDID QueryQuery TSTS UserUser PurposePurpose RecipientRecipient

11 select name, address, zip select name, address, zip from Customer, from Customer, Treatment where disease Treatment where disease = ‘diabetes’ and cid=pcid= ‘diabetes’ and cid=pcid

T3T3 jamesjames marketingmarketing othersothers

22 select name, address select name, address from Customer where from Customer where zip=‘95112’zip=‘95112’

T3T3 johnjohn contactcontact othersothers

Query 1 was executed at time T3

Page 26: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Backlog Table (Time Backlog Table (Time Stamp)Stamp)

NameName AddresAddresss

…… OPROPR TSTS

JaneJane 1234…1234… …… II T2T2

JaneJane 1234…1234… …… UU T4T4

AliceAlice …… …… II T1T1

Attributes also in the source table Attributes only in the backlog table

Jane’s record was inserted at time T2 and updated at time T4. The backlog table records both versions of her information

Operation on a tuple among Insert, Update and Delete

Timestamp of the operation

C. S. Jensen, L. Mark, and N. Roussopoulos [TKDE 1991]

Page 27: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Merge Logged Queries Merge Logged Queries and Audit Expressionand Audit Expression

Customer

c, n, …, t

audit expression := T.p=C.c and C.n= ‘Jane’

T.s

Select := T.s=‘diabetes’ and T.p=C.c

C.n, C.a, C.z

C

C

Merge logged queries and audit expression into a single query graph

Treatment

p, r, …, t

TT

Page 28: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Transform Query Transform Query Graph into an Audit Graph into an Audit QueryQuery

Customer

c, n, …, t

audit expression := X.n= ‘Jane’

‘Q1’

Select := T.s=‘diabetes’ and C.c=T.p

C.n

C

X

View of Customer (Treatment) is a temporal view at the time of the query was executed

The audit expression now ranges over the logged query. If the logged query is suspicious, the audit query will output the id of the logged query

T

Treatment

p, r, ..., t

Page 29: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Scenario OutcomeScenario Outcome

The audit uncovers that Query 1 in the query The audit uncovers that Query 1 in the query log accessed Jane’s informationlog accessed Jane’s information

Page 30: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

OutlineOutline

Introduction and motivationIntroduction and motivation Problem statementProblem statement FoundationsFoundations System organization and System organization and

algorithmsalgorithms PerformancePerformance SummarySummary

Page 31: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Empirical Evaluation: Empirical Evaluation: GoalsGoals Cost of maintaining backlog Cost of maintaining backlog

tablestables– Understand the impact of Understand the impact of

maintaining backlog tables on maintaining backlog tables on ongoing database operationsongoing database operations

Cost of running auditsCost of running audits– Understand whether audits can run Understand whether audits can run

in reasonable timein reasonable time

Page 32: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Experimental SetupExperimental Setup

IBM M Pro 6868 IntellistationIBM M Pro 6868 Intellistation– 800 MHz Pentium III processor800 MHz Pentium III processor– 512 MB of memory512 MB of memory– 16.9 GB disk drive16.9 GB disk drive

Windows 2000 Version 5, SP 4Windows 2000 Version 5, SP 4 DB2 v7 with default settingsDB2 v7 with default settings TPC-H databaseTPC-H database

– Supplier tableSupplier table 100,000 tuples100,000 tuples

Page 33: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

System StructuresSystem Structures

IndexingIndexing– Eager indexingEager indexing

Maintain an index over the backlog tableMaintain an index over the backlog table Maintained during ongoing database operationsMaintained during ongoing database operations

– Lazy indexingLazy indexing No index over the backlog tableNo index over the backlog table Create indices at the time of auditCreate indices at the time of audit

Choice of indexChoice of index– Simple indexSimple index

Primary key of source tablePrimary key of source table– Composite indexComposite index

Primary key of source tablePrimary key of source table Time stampTime stamp

Page 34: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Impact on Ongoing Impact on Ongoing OperationsOperations QueriesQueries

– Additionally log the query stringAdditionally log the query string Already performed in many application Already performed in many application

environmentsenvironments

UpdatesUpdates– For each updated tuple,For each updated tuple,

Insert a tuple to the backlog tableInsert a tuple to the backlog table

– Inserts and deletes are handled similarlyInserts and deletes are handled similarly In a majority of environments, queries In a majority of environments, queries

are much more frequent than updatesare much more frequent than updates

Page 35: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Update PerformanceUpdate Performance

100,000 tuples in Supplier table100,000 tuples in Supplier table Update statement updates all tuplesUpdate statement updates all tuples Each update statement fires triggers Each update statement fires triggers

which inserts an additional 100,000 which inserts an additional 100,000 tuples in backlogtuples in backlog

Evaluate impact of multiple versions Evaluate impact of multiple versions on performanceon performance

Page 36: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Overhead on UpdatesOverhead on Updates

0

50

100

150

200

250

5 20 35 50

# of versions per tuple

Tim

e (

min

ute

s)

CompositeSimpleNo IndexNo Triggers

Simple wins over Composite

7x if all tuples are updates

3x if a single tuple is updated

Eager indexing doesn’t add much cost

Number of version of each tuple in the Supplier backlog

table

Page 37: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Audit Query Audit Query PerformancePerformance

Audit query:

select ‘Q’ from Supplier where skey = k

Experiment:

Evaluate the impact of the number of versions of tuples in the backlog table on performance

Page 38: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Audit Query Execution Audit Query Execution TimeTime

1

10

100

1000

1 10 20 30 40 50

# versions per tuple

Tim

e (

mse

c.)

Simple-ISimple-CComposite-IComposite-C

Composite wins over simple if initial version is selected

Simple wins over composite if the current

version is selected

Page 39: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

TakeawaysTakeaways

The composite indexThe composite index– Enhances the performance of audits, Enhances the performance of audits,

butbut– Additionally burdens updates when Additionally burdens updates when

using eager indexingusing eager indexing The system supportsThe system supports

– Efficient auditingEfficient auditing– Without substantially burdening normal Without substantially burdening normal

query processingquery processing

Page 40: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Related WorkRelated Work

Oracle Privacy Security Auditing– Facility for logging queries with timestamp– Flash-back queries

Restores the version of the data at the time of the query– No support for automated auditing

User manually selects queries from the log and runs them The user to decide if the query is suspicious

G. Miklau D. Suciu [SIGMOD 2004]– Formal analysis of information disclosure in data exchange

Is information about a secret query S revealed by views V1,…,Vn Considers all possible instances of a database schema Assumes tuple independence

– We’re interested in given instances (temporal versions)– Nonetheless, it will be interesting to explore the connection

between the two works Active enforcement of policies by limiting disclosure Active enforcement of policies by limiting disclosure

[VLDB’04][VLDB’04] Literature on multi-query optimization

Page 41: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

SummarySummary

In light of new privacy legislationIn light of new privacy legislation– The problem of auditing usage of The problem of auditing usage of

information represents an important information represents an important opportunity for database researchopportunity for database research

Formalized the problem through the Formalized the problem through the fundamental concepts of fundamental concepts of indispensable tuple and suspicious indispensable tuple and suspicious queriesqueries

Achieved our design goals:Achieved our design goals:

Page 42: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Design GoalsDesign Goals

Convenient languageConvenient language Fast and precise on auditsFast and precise on audits Non disruptive Non disruptive

– Minimal performance impact on Minimal performance impact on normal database operationnormal database operation

Fine grainedFine grained

Page 43: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

BackupBackup

Page 44: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Multiple Candidate Multiple Candidate QueriesQueries

audit expression := C.n= ‘Jane’

‘Q1’

audit expression := C.n= ‘Jane’

‘Q2’

Union

Page 45: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Aggregate Queries Aggregate Queries with Havingwith Having

group:= c1, …, ci

c1, …, ci, agg1, …, aggn

select:= …

c1, …, ci

Qs

Qg

Qh

audit expression := …

c1, …, ck

audit expression := …

c1, …, ck

select:= q1.c1=q2.c1 and … and q1.ci=q2.ci

‘Q1’

q1q1

The join on aggregate columns ensures that the group being tracked by the audit has not been eliminated by the having clause

Page 46: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Dynamic Temporal ViewsDynamic Temporal Views

Customer_backlog

c, n, a, h, z, o, t, ts, op

Select :=ts <= andop <> ‘delete’ andnot(C5)

c, n, a, h, z, o, t

Exists :=C4.ts <= andC3.c = C4.c andC4.ts > C3.ts

*

C3

C1

C4

C5

View of Customer table at time c = id

n = name

a = address

h = phone

z = zip

o = contact

t = marketing

ts = ts

op = opr

Time stamp of the logged

query

Page 47: Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant Intelligent Information Systems Research IBM Almaden.

Cost of Building Cost of Building Indices over Backlog Indices over Backlog TablesTables

0

2

4

6

8

10

12

14

# versions per tuple

Tim

e (

min

ute

s)

TS-CompositeTS-Simple