Top Banner
R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer Science (1,2) Stanford University March 1988
70

R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxAcquisition of Knowledge from

a Database

Gio Wiederhold, Ph.D. 1

Robert L. Blum, M.D., Ph.D. 2

Michael Walker 3

Departments of Medicine (1,3) and Computer Science (1,2)

Stanford University

March 1988

Page 2: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxPresentation

1. Review of general concepts as used by us2. Overview of RX3. Data and knowledge processing4. The Architecture to support RX5. General Conclusions6. Future Work• Objectives

1. Gain an understanding for interactions in a large knowledge-data system

2. Get a feeling for some of the detailed implementation issues

3. Learn from a working system, not fantasy• This is not an introduction to AI . . .

Page 3: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

Rx1. Basic Concepts

Computing for DECISION-MAKING a global objective

Combine Data --- the state of the world --- Knowledge --- our abstractions --- \Boxit{Computational Engine}Predictions of the Future\centerline{Paradigm}\smallskip\table{&Traditional &\VERT & Artificial Intelligence\crKnowledge:\ & Program &\VERT & Rules, ... \crData: & Files &\VERT & Ground rules, \cr& &\VERT & \quad instance frames \crEngine: & CPU &\VERT & CPU and interpreter \cr}

Page 4: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxData, Knowledge, Information

{\bf Data:} 1. Factual observations on specific objects or events 2. Measured in the past 3. Objectively verifiable\bigskip{\bf Knowledge:} 1. General descriptions or abstractions on classes of objects or events 2. Predicting the future 3. Obtained from experts 4. Uncertain and not verifiable\bigskip{\bf Information:} 1. Data or knowledge previously unknown to the receiver 2. Used for decision-making \vfillLitmus test:If an automatic process or clerk can collect the material then we are talking about {\sl data.} \bigskipIf an expert has to provide the materialthen we are talking about {\sl knowledge}.

Page 5: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxAI System Modes

Early Expert Systems 1. Data poor 2. Goal driven (user request) 3. Backward chaining 4. Often focused 5. Minimize data requests from user\bigskipKnowledge Based Systems / EDS 1. Data rich 2. Can be data-driven (triggers) 3. Forward and backward chaining 4. Easily explosive 5. Minimize repetetive data requests

Page 6: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxData and Knowledge

Information is created at theconfluence of

data – the state & knowledge -- the ability to select and project the state into the future

Knowledge LoopKnowledge LoopData LoopData Loop

EducationEducation

RecordingRecording

ActionAction

StorageStorage

SelectionSelection

IntegrationIntegration

SummarizationSummarization

Decision-makingDecision-making

State changesState changes

AbstractionAbstraction

ExperienceExperience

Knowledge increaseKnowledge increase

Page 7: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

Rx2. Overview of RX

Objective of RX} Knowledge Extraction from Databases

\centerline{Hypothesis}\medskip Databases contain much experience, (more than any single physician can accumulate) This knowledge can be extracted to serve (eventually) knowledge-based advice-giving systems\vfillKnowledge is used to drive the system\vfill The Knowledge representation for initial and derived is identical\vfill

Page 8: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxComponents

1. Medical database: relational model, transposed extracted from clinical use\bigskip2. Medical Knowledge base: frames 2.1 for multiple interpreters multi-objective 2.2 interlinked structures\bigskip3. Statistical Knowledge: rules \smallskip4. Statistical Validation: programs

Page 9: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxProcessing Flow

Cycle: Discovery - Study - Modeling - Verification - - Augmentation of Scientific Knowledge\threecol{\hfill Med.Experts\RIGHTARROW}{\ \Boxitwo{Medical}{Knowledge}\

hfill}{}\medskip\threecol{\hfill$\swarrow$}{\hfill$\nwarrow$\ new}{\hfill rejected }\threecol{}{\hfill\UPARROW}{\hfill \UPARROW \quad}\threecol{\Boxitwo{Discovery}{Module}}{}{\hfill\Boxitwo{Study}{Module}}\threecol{ \DOWNARROW}{}{ \UPARROW\hfill}\threecol{Hypotheses}{}{\hfill model and data}\threecol{ \DOWNARROW\hfill$\searrow$}{\hfill$\nearrow$}{\hfill\hfill\

UPARROW\quad}\threecol{select\hfill}{\ \Boxitwo{Model}{Building}\hfill}{ \Boxitwo{Clinical}

{Data}\hfill}\threecol{Researcher}{\ \UPARROW\hfill}{\hfill \UPARROW\quad}\threecol{}{\Boxitwo{Statistics}{Knowledge}\hfill}{\hfill experience}\threecol{}{$\nearrow$\hfill\hfill$\nwarrow$ }{ Clinicians\hfill}\threecol{ Statisticians\hskip-40pt}{}{\hskip-70pt Epidemiologists \hfill} } % end tt, small

Page 10: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxRoles

\Boxit{Medical Knowledgebase} initial knowledge directs inference \RIGHTARROW \RIGHTARROW accepts new knowledge\bigskip\Boxit{Medical Database} contains past experience basis for inference \RIGHTARROW\Boxit{Statistical Knowledge} processing rules\bigskip\Boxit{Interpreters} hand-coded engines interpret the knowledge

Page 11: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxDatabase

Aramis---American Rheumatism Association uses TOD \hfill (began 1969)\quad\medskip Time-Oriented Database System Features: \ $\bullet$ Domain oriented data types 1. Date 2. Severity codes ( 0, +, ++, ... 3. User defined codes (female, male, ... . . .\medskip $\bullet$ Subsetting operations $\bullet$ Transposition Size of Stanford subset: about 30 Mbytes

Page 12: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxKnowledge Base

More complex than the databaseInterlinked structures: Categorical Knowledge is Hierarchically Organized Definitional Knowledge has distinct Hierarchies Causal Knowledge links across the hierarchies\smallskipAggregate knowledge is a network structure represented by frames with references to each other\vfill{\tt\line{\hfill \Boxit{ALL-UNITS} \hfill}\vfill\line{\hfill\Boxit{STATES}\hfill\Boxit{ACTIONS}\hfill\Boxit{STAT' M'DS}\hfill}\vfill\line{\hfill\Boxit{DIAGN'C-CAT'S}\hfill\Boxit{DRUGS}\hfill\Boxit{REGRESSION}\hfill}\line{\hfill\Boxit{\in... }\hfill\Boxit{\in... }\hfill\Boxit{\in... }\hfill}\vfill\line{\hfill\Boxit{CARDIAC-DIS'S}\hfill\Boxit{ANTIBIOTIC}\hfill\Boxit{MULT-REG'N}\hfill}\hfill etc \hfill etc \hfill etc \hfill\vfill} % end tt

Page 13: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxDisplaying the RX Knowledge Base

Menu of Display Options {\smallfont%\def\threecolp#1#2#3{\line{\hbox to 50truept{{\tt#1}\hfil}\hbox to150truept{#2\hfil}{\ninett #3}\hfil}}\table{MA-&function(args)&EXAMPLE\cr\hfill CRO & &\cr\hrulefill&\hrulefill&\cr& \ &Display & \crDS &\ schema(node)&DS Nephrotic-syndrome\hskip-40pt\crDP&\ paths(c$\leftrightarrow$e)&DP SLE Cholesterol\crDC &\ causes(e-node)&DC WBC \crDE &\ effects(c-node)&DE Prednisone \crDD&\ distribut'n(c\ e)&DD Prednisone Cholesterol\hskip-40pt\crDM &\ model(c\ e)&DM Prednisone Cholesterol \hskip-40pt\crDEV&\ evidence(c\ e)&DEV Prednisone Cholesterol\hskip-45pt\crDF &\ frequencies &DF \crD &\ desc'dnts-tree&D Diagnostic-Categories \hskip-40pt\crCLASS\hskip-10pt&\hskip10pt \ classificat'n&CLASS Azathioprine\crSPEC\hskip-10pt&\hskip10pt \ children&SPEC Diagnostic-categories\hskip-40pt\crSIBS\hskip-10pt&\hskip10pt \ siblings&SIBS Azathioprine\crTR &traverse right&TR Glomerulonephritis \hskip-40pt\crTL &traverse left&TL Glomerulonephritis \hskip-40pt\crPL &print property list \hskip-20pt& PL Validity \crPPL&print verbose pr.list\hskip-20pt& PPL Frequency \cr}}%end small\vfill(These functions provided many of the slides below)

Page 14: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxHierarchical Classification of Diseases

\bigskipeach frame has a generalization slot and a specialization slot:\vfill{\tt\line{\hfill respiratory diseases:\hfill}\medskip\line{\hfill genl: all categories of disease\hfill}\medskip\line{\hfill spec: pneumonia, asthma, emphysema\hfill}\vfill\table{pneumonia &asthma& emphysema\crgenl: resp'ry dis.&genl: resp'ry dis.&genl: resp'ry dis.\cr& & \crspec: &spec:&spec: \cr\ pneumococcal pn.&\ allergic asthma&\ pco2 retention\cr\ klebsiella pn.&\ intrinsic asthma&\cr}}\vfill Assumptions: Completeness across Inheritance

Page 15: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxDisplay Hierarchical Frames

\centerline{Display the Descendants in the Hierarchy}\medskip{\tt D Autoimmune-Disorders \medskipAutoimmune-disorders SLE Lupus-nephritis Cardiac-lupus CNS-lupus lupus-serositis Ra Arteritis} % end tt\bigskip Hierarchical Classification\medskip{\ttCLASS Glomerulonephritis\medskip(Glomerulonephritis Renal-disorders Diagnostic-categories States)} % end tt

Page 16: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxDefinitions

Definitions may be in Terms of other Attributes of other Objects\bigskipIt is important that medical knowledge is available at a high level of abstraction,\medskipbut the definition may use other (lower) frames, in another hierarchical subtree\bigskip{\tt Pneumonia\medskip definition: Temperature $>$ 102 degrees F. and WBC $>$ 10,000 cells per mm$^3$ and Chest X-RAY = Lobar Infiltrate} % end tt\bigskip\bigskipAt the lowest level the frames correspond to attributes found in the DATABASE

Page 17: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxCausal Knowledge Links Nodes

i.e.: {\tt Temperature is Affected by Pneumonia }{\tt\baselineskip=11pt\smallskip\twocol{ Pneumonia }{ Temperature}\smallskip\twocol{affected-by: }{affected-by:}\twocol{\hfill Alcoholism\ }{\hfill Pneumonia\ }\twocol{\hfill Diabetes\ }{\hfill Influenza\ }\smallskip\twocol{effects:}{effects:}\twocol{\hfill Temperature\ }{\hfill Perspiration\ }\twocol{\hfill WBC\ }{}\twocol{\hfill Chest-XRAY\ }{}} % end small\medskip

Each causal relationship is represented as a set of features:\smallskip{\tt intensity, frequency, direction,}{\it setting, functional form, validity, evidence }\bigskipThe relationship ``{\tt Pneumonia increases temperature}":\smallskip{\tt\baselineskip=11pt intensity: to 104 degrees F. frequency: common direction: + setting:\quad studied\ in\ middle-aged\ patients with pneumococcal pneumonia functional form: .5log (severity\ pneumonia) + 98 validity: widely confirmed evidence: citations\ to\ medical\ literature} % end small

Page 18: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxSUMMARY of Round 1

Most `KNOWLEDGE' is in the relationships\smallskipFrames ({\sl and people}) define their meaning through relationships to others\vfillIn a small knowledgebase linkages can be arbitrary\rightline{\hfill Semantic Nets}As the knowledge grows we imposed structure 1.\quad Categorical, 2.\quad Definitional, 3.\quad Causal.\bigskipTo relate knowledge to the data this structure must be applicable to data instances\rightline{\hfill in class frames}\rightline{\hfill schema frames at DB level}

\vfill

Page 19: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

Rx3. Data and Knowledge Processing

Scientific process is a cycle:\medskipInstances\quad \RIGHTARROW \quad Experience\smallskipEducation + Experience\quad \RIGHTARROW \quad Knowledge\smallskipUnexpected Instances\quad \RIGHTARROW \quad Questions\smallskipQuestions + Scientific training\quad \RIGHTARROW \quad Hypothesis\smallskipHypothesis + Knowledge\quad \RIGHTARROW \quad Model\smallskipModel + Data\quad \RIGHTARROW \quad Validation\smallskipValidation + Dissemination\quad \RIGHTARROW \quad New

Knowledge

Page 20: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxHow and Who?}

Our Example: MEDICINE\smallskip\table{Student learns&\ \ 8 y\cr \quad cycle starts \cr Clinician&treats&\ \ 5 y\cr Clinician&observes exceptions&+ 1 y\cr Clinician&studies cases&+ 2 y\cr Clinician&formulates Hx&= 2 y\cr Archivist&collects data&= 2 y\cr Epidemiologist&formulates model&+ 3 m\cr Statistician&applies methods&+ 3 m\cr Data Analyst&selects and processes data&+ 6 m\cr All&write&+ 1 y\cr Editors&review&+ 1 y\cr Journal&publishes&+ 1 y\cr Clinicians&adapt practice&+ 3 y/cr}\bigskipMany participants\smallskipMuch time in cycle: \hfil 16 y \qquad net

Page 21: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxOperational Cycle of RX

Models this scientific process\medskip1. Collect data -- done outside of RX2. Collect and represent Medical Knowledge\quad 2.1 Define categorical frames from `textbook' knowledge densely in area of interest ($\equiv$ DB) use inheritance in categorical hierarchy outside\quad 2.2 Make Definitions to link Concepts to Database\quad 2.3 Initialize known cause/effect linkages in area of interest\smallskip3. Collect rules for statistical processing tied to the data description4. Program control mechanisms for 5. -- 10.\smallskip5. Discover unusual events RX: Brute force correlation RADIX: scan for time-variations6. Generate hypotheses7. Build model for hypothesis testing8. extract data for statistical Hx test9. run test10. append validated HX to knowledge base\smallskip11. iterate to 5.

Page 22: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

Rx4. The Architecture of RX

\hfill to support the cycle \ \bigskip\Boxit{DATABASE} and \smallskip \Boxit{KNOWLEDGE BASE}\bigskip\line{\Boxit{DISCOVERY module} \RIGHTARROW generate}\smallskip\centerline{hypotheses}\smallskip\hfill validate \LEFTARROW\Boxit{STUDY Module} \ \bigskip\centerline{\Boxitwo{STATISTICAL}{PROGRAMS}}\bigskip\centerline{all controlled by several}\centerline{\Boxit{KNOWLEDGE INTERPRETERS}}

Page 23: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxAI paradigm

Similar to DENDRAL

\medskip

\subtitle{GENERATE and TEST}

\centerline{Discovery Module \RIGHTARROW Study Module}

\medskip

All kinds of correlations \RIGHTARROW

\smallskip

\line{\hfill independendent, significant correlations}

Page 24: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxClinical Database

Data is a byproduct of medical practiceCases are representativeMany uses of data: Health care Billing Medical Audit Research\bigskipARAMIS\smallskipRelational model\smallskip2 relations Patients: (pat-no, DoB, ... (50 values)) Visits: (pat-no, date-of-visit, reason ... (500 values))\bigskipInternally transposed

Page 25: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxThe ARAMIS Database

\medskipTransposed: by ATTRIBUTE by PATIENT and VISIT\bigskipData attribute column (values p1.v1 p1.v2 . . . p2.v1 . . . ){\smallfontPatient-Id ({\tt 1 1 1 1 1 1\quad 3 3 3 3 3 3 3 3 \quad . . . 6 6 6 6 6 6 6 . . . 78 78 78 . . .}\smallskipVisit-date ({\tt 10Mar78 11Apr78 23Jun78 1Jul78 10Jul78 4Dec78 \quad 15May78 . . .}\smallskipCholesterol ({\tt 31 29 24 30 31 29 \quad 23 - 27 25 = 23 - = \quad . . . \quad 20 22 = = = 21 . . . 32 34 . . .}\smallskipPrednisone {\tt . . .}} % end small\vfillColumns stored as a variable length compressed records controlled by a prefix table 0(!), 1(0), 2(-), 3(=)\vfillCollected using Forms

Page 26: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxThe Database in RX

Transposed by PATIENT by FRAME (States, Actions) by VISIT \RIGHTARROW VALUE\medskipData attribute strings:\smallskip{\tt (Patient1\ (Aspirin ((1 30)(2 20)(3 20)(4 20)(5 20) ...)) \ (Cholesterol ((1 215)(2 229)(4 230)(...))\ . . . \ (Prednisone ((1 50)(2 27)(4 25))\ . . .\ (Visit-date (1Jun80 15Jun80 12Jul80 ...)\ . . .\smallskip (Patient78 \ (Aspirin ((6 10)( ... ))\ (Cholesterol ((1 ... ))\ . . .

\bigskip. . .}

Page 27: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxSchema Frames

Other slots define computational parameters: Example: Schema for Hemoglobin {\tt\baselineskip=11pt\table{ Hemoglobin &{\nineit explanation}\cr----------&\crattribute-type: &{\nineit represented as a }\cr\hfill point-event &{\nineit \hfill time:value pair}\crvalue-type: real &{\nineit i.e. a real-valued number}\crrange: 0 < value < 25 &{\nineit the legal range of values}\crunits: grams per deciliter &{\nineit units of measurement}\crsignificance: .1 &{\nineit used for rounding off values}\cr& \cr}}\medskip{\rm and Real World Knowledge }\smallskip{\tt ---------- function: oxygen transport molecular-weight: 67,000 daltons structure: Fe + heme + 4 polypeptide chains part-of: red blood cell affected-by: high altitudes, genetic make-up clinical-effects: deficiency causes fatigue severe deficiency may cause cardiac failure}

Page 28: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxDiscovery Module

RX uses database directly --- (no knowledge used)%\pageno=61Generate Hypothesis of relationships Search for binary correlations of events/concepts, time lagged\vfillImprecise, often false, or useless: HX may be known but overlooked in knowledgebase acquisition (discard Hx, update KB) HX may be trivial (discard) HX is worthy of study (to see if it seems valid)\bigskipCostly --- use subsets of data --- run on weekends\medskipSelect --- strong $\rightarrow $ rank by correlation (R-value) --- interesting --- non--obvious\vfillFinal selection of hypothesis by manual inspection

Page 29: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxData for Discovery

Initially only use a subset 50 patients\smallskip 50 attributes $\rightarrow $ 50! interactions\smallskip 6---50 visits\smallskip 12 timelags\vfillFuture--- use AI - model guidance ?

but avoid excessive restrictions\vfillRADIX trigger from changes in the data at a high level of abstraction (see later)\vfill--- the validation is done on the full set of data ---\bigskipImportant: first a patient's course is characterized then correlations are computed over the characterizations.\smallskip

Page 30: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxCombining Correlations Across Patients

bigskipThe patient is the entity -- not the events, the \#(events/observations) differ greatly\bigskipPatient-based score\smallskipPatient 1: $r_1 = cor(x,y) log[pval(r_1)]$\smallskipPatient 2: $r_2 = cor(x,y) log[pval(r_2)]$Etc.\bigskip$$ score (x,y) = - 2 \sum\sb{i\inset all patients} log[pval(r_i)]$$\bigskip$$ score (x,y) \approx \chi\sp{2}2p $$

Page 31: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxOutput from Discovery Module

\subtitle{Possible Causal Effects of Prednisone}\bigskip\table{variable&lag strength&\cr\crHemoglobin&(B + 518)\crAnti-DNA-Hemagglut&(B - 514)\crDisease-Activity&(R + 469)\crC3&(B + 389)\crFatigue&(R + 370)\crUrine-WBCS&(R + 350)\crAlbumin&(R - 346)\crBP-Diastolic&(C + 322)\crWBC&(C + 306)\crUrine-RBCS&(B -293)\crTemperature&(B - 275)\crWeight&(C + 269)\crLDH&(C + 268)\crGlucose&(C + 256)\crLog-Fana&(C - 238)\crLymphs&(C - 194)\crBP-Systolic&(C + 167)\cr...&...\cr}

Page 32: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxStudy Module

1. Use knowledge to build model for statistical analysis\quad 1.1 look at confounders and\quad 1.2 their temporal relationships\smallskip2. Use data estimates to select statistical procedures\quad 2.1 use rules\quad 2.2 use meta data cardinality type information(1. and 2. are interdependent, but iteration is not now automated)\smallskip3. Extract required data from database\smallskip4. Perform analysis\smallskip5. Inspect result if significant - insert into knowledge base\vfill\centerline{\Boxitwo{Another study will now take}{the new knowledge into account}}

Page 33: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxNext: build statistical models

\centerline{ We have to locate all (known) confounders}{\tt\baselineskip=11pt GLOMERULONEPHRITIS as a confounding variable for Prednisone and Cholesterol:\smallskipGLOMERULONEPHRITIS (30 pct activity) increases \ NEPHROTIC-SYNDROME (3 gms proteinuria/24 hrs) \ \ is treated by PREDNISONE (604 \% of baseline)

GLOMERULONEPHRITIS (30 pct act'y) is treated by \ PREDNISONE (182 \% of baseline)

GLOMERULONEPHRITIS (30 pct act'y) increases \ NEPHROTIC-SYNDROME (3 gms ...) increases \ \ CHOLESTEROL (120 mgms/dl)

GLOMERULONEPHRITIS (30 pct act'y) is treated by \ PREDNISONE (182 \% of baseline) attenuates \ NEPHROTIC-SYNDROME (-1 gms ...) decreases \ CHOLESTEROL (-22 mgms/dl)

GLOMERULONEPHRITIS (30 pct act'y) is treated by \ PREDNISONE (182 \% of baseline) increases \ \ CHOLESTEROL (11 mgms/dl) $new$

GLOMERULONEPHRITIS (30 pct act'y) is treated by \ PREDNISONE (182 \% of baseline) attenuates \ \ SLE (-6 pct activity) attenuates \ \ \ NEPHROTIC-SYNDROME (0 gms ... ) decreases \ \ \ \ CHOLESTEROL (-5 mgms/dl)} % end small\vfill

Page 34: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxThe Category Hierarchy is Complete

Necessary for the CLOSED-WORLD assumption made by its interpreter\smallskip{\sl Example: } The Specialization at a top level\medskip{\tt SPEC Diagnostic-categories }\smallskip{\tt\baselineskip=10pt(Arthritic-disorders~Autoimmune-disorders Cardiac-dis'rs~Dermatologic-dis'rsElectrolytic-dis'rs~Endocrine-dis'rsGi-dis'rs~Gynecologic-dis'rs~Hematologic-dis'rs~Hepatic-dis'rs~Hypertensive-dis'rsImmunologic-dis'rs~Infectious-dis'rs Metabolic-dis'rs~Neurologic-dis'rs Non-specific-dis'rs~Nutritional-dis'rs Oncologic-dis'rs~Ophthalmologic-dis'rs Psychiatric-dis'rs~Pulmonary-dis'rs Renal-dis'rs~Urologic-dis'rs~Vascular-dis'rs)} % end small\vfill\centerline{SIBLINGS}\medskip{\tt SIBS AZATHIOPRINE }\medskip{\tt

(CHLORAMBUCIL~CYCLOPHOSPHAMIDE)} % end small\vfillAt low levels made feasible through inheritance

Page 35: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxComplete Property List

PL NEPHROTIC-SYNDROME \smallskip{\tt\baselineskip=11pt GENL: RENAL-DISORDERSSPEC: (PROTEINURIA HEAVY-PROTEINURIA)DEFINITION: (OR (DURING \& --) (AND \& --))TYPE: INTERVALEFFECTS: (URINE-PROTEIN-RANGE ALBUMIN 24-HR-URINE-PROTEIN --)MINIMUM-DURATION: 30MINIMUM-POINTS: 2INTERVALFN: MEAN-DURING-INTERVALVALUE-TYPE: BINARYINTRA-EPISODE-GAP: 100INTER-EPISODE-GAP: 180RECORDS: INVERTEDAFFECTED-BY: ((PREDNISONE \&) (GLOMERULO- NEPHRITIS \&) (SLE \&))PARTITION: (0 .5 1 --)UNITS: "gms proteinuria/24 hrs"PROXIES: (ALBUMIN 24-HR-URINE-PROTEIN URINE-PROTEIN-RANGE)ONSET-DELAY: 7MINIMUM-INTERVAL: 30CARRY-OVER: 30} % end tt

Page 36: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxDefinitions over Time

Our observations are actually over timeThis has important effects:\smallskip1. The DEFINITIONS combine EVENT observations as recorded in the database into INTERVAL information\smallskip1.1 INTERVALS have parameters as {\tt MAX, MIN, AVE, RATE, . . .}\medskip1.2 Patients differ in the number of EVENTS observed for a disease course but a Course should be one interval a treatment should be one interval {(\it same for other time-based data --\quad most data in planning extrapolate from past series to future)}

Page 37: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxDefinitions and Missing Data

2. We elevate detailed observations to higher level concepts and do\ \lower6pt\Boxit{Statistics on Concepts, not on Facts} ~?\smallskip2.1 Our experts, and the knowledgebase deals better with higher level concepts\smallskip2.2 We can combine multiple event-types to substantiate an interval concept (more credibility in the face of missing data) {\tt NEPHROTIC SYNDROME during HEAVY-PROTENURIA or PROTENURIA and ... }2.3 We can acount for masked symptoms {\tt ... during SYMPTOM or DRUG {\sl given for that symptom}.}

Page 38: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxWhy ignore the Facts?}

Whats wrong with data:\smallskip 1. variable number of observations 2. taken at unpredictable intervals 3. often incomplete\bigskipUse higher level concepts defined by framesto aggregate incomplete facts into meaningful concepts: {\sl Labeling}\bigskip 1. Intial finding + continuing treatment = continuing disease state\in(treatment can mask findings,\in~ comtinued test for findings are costly)\medskip 2. Findings of events over time \RIGHTARROW interval = worsening/steady/improving disease state (matters more than level of state)

Page 39: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxUse of This Information

\subtitle{MODEL BUILDING}There can be many Paths between two nodes in our network even at the higher, CONCEPT level\smallskipNew knowledge \RIGHTARROW New direct causal path with parameters\smallskipBut, any alternative path can also explain a hypotheses\smallskipIf $\sum$ of alternate paths explains all of the relationship no new knowledge!\hfill\Boxit{Hypothesis is invalidated}\quad.So:1. look for all paths -- intermediate nodes are covariates2. prune subsumed paths3. omit infrequent covariates to simplify model (omitting frequent covariates -- too much loss of data)

Page 40: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxCycles

Note there can be cycles, but the time delay imposes an ordering:\bigskipExample{\ninett\baselineskip10pt\def\M{$-$}\table{ &\hskip-10pt intensity \M delay& \crSedentary Life&\M +2 \M months \RIGHTARROW & Diet \crDiet &\M +2 \M months \RIGHTARROW & Cholesterol \crCholesterol&\M +2 \M years \RIGHTARROW & Coronary Art.sc. \crCoronary Art.sc.&\M +4 \M months \RIGHTARROW & Heart Attack \crHeart Attack&\M -2 \M days \RIGHTARROW & A-type behavior \crHeart Attack&\M -1 \M hours \RIGHTARROW & Smoking \crHeart Attack&\M +3 \M minutes \RIGHTARROW & Sedentary Life \crHeart Attack&\M +5 \M minutes \RIGHTARROW & Death \crHeart Attack&\M +2 \M days \RIGHTARROW & Death \crCoronary Spasms&\M +3 \M months \RIGHTARROW & Heart Attack \crSmoking&\M +1 \M months \RIGHTARROW & Coronary Spasms \crHypertension&\M +4 \M years \RIGHTARROW & Coronary Art.sc. \crHypertension&\M +3 \M months \RIGHTARROW & Coronary Spasms \crA-type behavior&\M +1 \M years \RIGHTARROW & Hypertension \crA-type behavior&\M +1 \M varied \RIGHTARROW & Coronary Spasms \crAge&\M +1 \M years \RIGHTARROW & Cholesterol \crAge&\M +2 \M years \RIGHTARROW & Hypertension \cr}}\bigskipThere are positive and negative paths and loops\smallskipCannot be captured by a simple logical model

Page 41: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxModal Effects of Prednisone

Frequency and strength of causal relationship\medskip{\tt DE PREDNISONE MODE \in{\sl /* one link away */}\medskipPREDNISONE, at a level of 30 mgms/day,\medskipusually increases CHOLESTEROL by 50 to 130 mgms/dl,regularly attenuates NEPHROTIC-SYNDROME by 1.0 to 2.0 gms prot/24 hrs,regularly attenuates GLOMERULONEPHRITIS by 10.0 to 30.0 percent,commonly attenuates SLE by 10.0 to 30.0 percent activity,regularly decreases ANTI-DNA-HEMAGGLUT by 50 to 90 percent,regularly increases IMMUNOSUPPRESSION by 16 to 32 percent activity,regularly decreases EOS by 2 to 3 \% of WBC,occasionally increases KETOACIDOSIS by 20 to 100 mgms/dl of glucose,}\vfill{\eightrm {\tt*} Note that all the terms are represented by numerically encoded values}

Page 42: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxDisplay all Paths above threshold

to collect significant covariates:\medskip{\tt DP SLE CHOLESTEROL } (default $>0.1$)\medskip{\ttSLE $\{$30 percent activity$\}$ increases NEPHROTIC-SYNDROME $\{$1 gms proteinuria/24 hrs $\}$ increases CHOLESTEROL $\{$24 mgms/dl$\}$\medskipSLE $\{$30 percent activity$\}$ is treated by PREDNISONE $\{$182 \% of baseline$\}$ increases CHOLESTEROL $\{$14 mgms/dl$\}$ \medskipSLE $\{$30 percent activity$\}$ increases NEPHROTIC-SYNDROME $\{$1 gms proteinuria/24 hrs $\}$ is treated by PREDNISONE $\{$143 \% of baseline$\}$ increases CHOLESTEROL $\{$8 mgms/dl$\}$\medskipSLE $\{$30 percent activity$\}$ increases IMMUNOSUPPRESSION $\{$18 percent activity$\}$ increases HEPATITIS $\{$5 Iu/ml of SGOT$\}$ increases CHOLESTEROL $\{$6 mgms/dl$\}$} % end small

Page 43: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxDisplaying the Causes of Cholesterol

i.e., the Set of Nodes that Affect it\medskip{\tt DC CHOLESTEROL} {\sl /* other direction */}\medskip{\ttCHOLESTEROL\medskipalways is increased by PREDNISONEregularly is increased by HEPATITISregularly is increased by KETOACIDOSISusually is increased by NEPHROTIC-SYNDROME} % end small\vfill\centerline{Interpretation of the Frequencies}\medskipIs not linear over the range of terms:\medskip{\smallfont\def\threecolp#1#2#3{\line{ \hbox to 48truept{#1\hfil}\hbox to160 truept{#2\hfil}#3\hfil}\vskip-3truept}{\tt DF }\vskip-12pt\threecolp{Cell}{Adverb }{\hskip-10pt Probability}\medskip\threecolp{1 }{never* }{ .001}\threecolp{2 }{very-rarely }{ .005}\threecolp{3 }{rarely }{ .01}\threecolp{4 }{infrequently }{ .04}\threecolp{5 }{occasionally }{ .16}\threecolp{6 }{commonly }{ .32}\threecolp{7 }{regularly }{ .64}\threecolp{8 }{usually }{ .95}\threecolp{9 }{almost-always }{ .99}\threecolp{10 }{always }{ 1.00}\vfill * well hardly ever} % end small

Page 44: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxCausal Inference

\medskipGenerating New Knowledge in RX\smallskip Means\smallskipEstablishing and Quantifying New Causal Linkages\medskipCorrelations discovered do not establish 1. causality 2. directness \medskipad 1. causality: A causes B ? heuristic if B consistently follows in time A, then B does not cause A (there may be an unknown covariate C, causing both with different delays)\medskipad 2. directness: the correlation may be due to known covariates -- check the model as shown previously

Page 45: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxUse of the Covariate Model

Rule driven, but uses medical knowledge in frames uses metadata in frames is controlled by a frame hierarchy deterministic execution\bigskip1. Select proper statistical method: Use info about data from Schema Frames\smallskip2. Check if enough data is available: Ask DBMS portion for cardinality of subsets needed which distinguish the remaining covariates\vfill

Page 46: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxRunning the Study Module

Statistical knowledge is encoded as RULES.\bigskipThe statistical knowledge in RX is not deep: (no derivation from Probability theory)\medskip{\tt\baselineskip=11pt\line{Selecting instance of the class: STUDY-DESIGNS\hskip-50pt\hfil}\medskip\line{The candidate selected is: LONGITUDANAL-DESIGN\hskip-30pt\hfil}\medskipWould you like to see rules that determined selection of study design?**YES\smallskipLONGITUDINAL-DESIGN\smallskipPREREQUISITES: Can the EFFECT occur more than once in a patient's record?\smallskip\line{Do we have patient records in which values for\hskip-20pt\hfil} the EFFECT have occurred more than once\smallskipCROSS-SECTIONAL-DESIGN\smallskip\line{PREREQUISITES: If the dependent variable is \hskip-20pt\hfil} not a function of time, then use the CROSS-SECTIONAL-DESIGN\line{CROSS-SECTIONAL-DESIGN will also be used when\hskip-20pt\hfil}\line{ most patient records have only a few values\hskip-30pt\hfil}} % end tt

Page 47: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxNow select statistical procedure

The rule categorization is also hierarchical\smallskip{\tt\baselineskip=11ptSelecting instance of class: STATISTICAL-METHODS\ Considering instance: CONTINGENCY-TABLES\ Considering instance: T-TEST\ Considering instance: ANOVA\ Considering instance: REGRESSION\ Selecting instance of class: REGRESSION\ \ Considering instance: MULTIPLE-REGRESSION\ \ Considering instance: SPEARMAN-RHO\ \ Considering instance: KENDALL-TAU\ \ Considering instance: PEARSON-R\ Candidates whose prerequisites are satisfied: \ \ (MULTIPLE-REGRESSION SPEARMAN-\ \ \ RHO KENDALL-TAU PEARSON-R)\centerline{\it Conflict resolution rules are used to decide among these}The candidate selected is: MULTIPLE-REGRESSION\smallskip\ Considering the instance: DISCRIMINANT-ANALYSIS\ Considering the instance: FACTOR-ANALYSIS\ Considering the instance: LIFE-TABLES\medskipCandidates whose prerequisites are satisfied: (MULTIPLE-REGRESSION)\medskipThe candidate selected is: MULTIPLE-REGRESSION} % end small

Page 48: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxExplanation

\medskip{\tt\baselineskip=11ptThe candidate selected is: MULTIPLE-REGRESSION\medskipWou ld you like to see decision criteria for selecting statistical methods?**YES\medskipMULTIPLE-REGRESSION\medskipRULES: \quad If the independent variables are causally ordered, then do a hierarchical regression.\smallskip otherwise, do a standard regression.\medskipPREREQUISITES:\ Multiple regression is appropriate when the number of independent variables is greater than 1\smallskip All variables must be at least of \in measurement level = binary.\smallskip All variables must be normally distributed.\medskipStatistical method: MULTIPLE-REGRESSION} % end small

Page 49: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxMore explanation

{\tt\baselineskip=13pt\smallskip The \# of values recorded for the dependent var. for each patient must be $>$ 1 + the \# of independent variables\smallskip Next, there is the same minimum required \in \# of values for the independent variable of primary interest\smallskip To estimate the effect of the independent variable for a single patient, the coefficient of variation must be $>$ threshold = 10 percent\smallskip Finally, to do individual estimation, the total number of events must be $> 1 + $ $\#$ of indep. vars: the costliest criterion computationally} % end small

Page 50: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxThe Rules are in LISP:

\medskip{\tt Would you like to see the machine readable eligibility criteria? \medskip**YES\medskipEligibility criteria: \smallskip[AND (IGEQ (\#VALUES (QUOTE CHOLESTEROL) PAT) (ADD1 (FLENGTH VARS))) (IGEQ (\#VALUES (QUOTE PREDNISONE) PAT) (ADD1 (FLENGTH VARS))) (GREATERP (COEF-VAR (QUOTE PREDNISONE) PAT) .1) (IGEQ (FLENGTH (ENTRIES (QUOTE PRED-CHOL) NIL PAT)) (ADD1 (FLENGTH VARS]} % end small

Page 51: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxRunning the Analysis

1. Obtain the needed from the database attached already to the schema class frames2a. Run through IDS (XEROX LISP package)2b. Set up a run for a batch package BMDP, SPSS and extract results from output ({\it a pain})3. inpect the resulting t-value for statistical significance\vfillIf high, \Boxitwo{Place into Knowledge base}{~a new causal link !}\smallskipValidity 4 or 5 /10{\smallfont\table{ \in&10&indisputable~mechanism\cr&\ 8&wide~experimental~confirmation\cr&\ 6&confirmed~in~multiple~\cr& & retrospective~studies\cr&\ 4&confirmed~by~single~retrospective study\cr&\ 2&case citations\cr&\ 1& based on astrology\cr}

Page 52: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxDisplay a Causal Relationship

{\smallfont initialized or produced by an RX analysis}\medskip{\tt DM PREDNISONE CHOLESTEROL \medskipStatistical Model for Prednisone/Cholesterol effect:\medskipcholesterol = - 59.17575 albumin + 23.13479 log(prednisone) + 188.1984 + error term\medskipSetting of Effect: and not during KETOACIDOSIS {\it omitted} not during HEPATITIS {\it covariates}} % end small, tt\vfill\medskip {\smallfontMore from the study managed by RX :} % end small

Page 53: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxDisplay Evidence

{\ttDEV PREDNISONE CHOLESTEROL }{\tt\baselineskip=11ptValidity of and Evidence in Support of the Causal ~Relationship:~PREDNISONE~influences~CHOLESTEROL.\smallskipStudy~Design:~Longitudinal~DesignPerformed~on:~~5-May-81Database:~ARAMIS/Stanford-Immunology\smallskipValidity:~6~on~a~scale~from~1~to~10Interpretation~of~Validity:\in~~ ~strong~correlation~and~time~precedence: \in~~ ~known~covariates~controlled\smallskipName~of~Study:~PCTotal~Number~of~Patients:~21Median~Number~of~Visits~with~Complete~Data:~7.0Range~(Visits~with~Complete~Data):~5.0~to~30.0p-values~of~variables~in~the~model:~\smallskip\table{\quad&PREVIOUS-CHOLESTEROL & 0.4603615 \cr&ALBUMIN & 7.450581E-9 \cr&PREDNISONE & 1.564622E-7 \cr} % end table}\smallskipA~ detailed~synopsis~of~this~study~is~available in~the~EVIDENCE~file.} %end small\vfillWhen~knowledge~is~entered~from~other~sources~this~command~will~list~citations~to~the~literature

Page 54: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxDistribution Across Patients

to visually check normality -\hfill (improve!)\quad. prerequisite for many Statistical Methods\medskip{\tt DD PREDNISONE CHOLESTEROL}\medskip{tt\baselineskip=10ptDistribution across patients of CHOLESTEROL in units of mgms/dl, given a baseline value of 230 mgms/dl and, given a change in PREDNISONE from 0 to 30 mgms/day,\medskipusing deciles\smallskip\def\threecolp#1#2#3{\line{ \hbox to120truept{#1\hfil}\hbox to90 truept{#2\hfil}#3\hfil}}\line{Range of~~~\hfil Percentage \hfil Magnitude\hfil}\line{CHOLESTEROL \quad of Patients \hfil of Change\hfil}\smallskip\threecolp{100 150 }{ 0 }{extreme -}\threecolp{150 195 }{ 0 }{strong -}\threecolp{195 210 }{ 0 }{moderate -}\threecolp{210 225 }{ 0 }{weak -}\threecolp{225 230 }{ 0 }{equivocal -}\threecolp{230 235 }{ 0 }{equivocal +}\threecolp{235 250 }{ 0 }{weak +}\threecolp{250 280 }{ 10 }{moderate +}\threecolp{280 360 }{ 82 }{strong +}\threecolp{360 700 }{ 8 }{extreme +}\medskipThis presents a rough representation of the function. It~provides~hints~on~normality~ versus bi-modal~versus uniform~distributions.} % end small

Page 55: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxRX Summary

The RX structure presents a significant attempt to encode DEEP knowledgeSuch knowledge is distinguished from the SHALLOW operational knowledge in rule-based Expert systems.\medskipIf rules are adequate they are easier to collect and use.Frames allow a degree of structuring which is helpful in organizing larger bodies of knowledge\medskip Categorical Knowledge satisfies Closed World assumption Definitional Knowledge understand Time Causal Knowledge represents the critical relationships\smallskipAggregate knowledge is a complex network

Page 56: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxInformation Flow

Data is used in three phases

\bigskip

1. Initial experience for Medical Experts

browsing

DISCOVERY \RIGHTARROW tenative knowledge

hypotheses = proposed goals

\smallskip

2. Estimating if the hypotheses can be tested

Model building

\smallskip

3. Statistical processing

validation

new KNOWLEDGE

Page 57: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxFuture

\Boxit{RX / RADIX} Expand model to cover all knowledge related to facts available from the database\bigskip Apply the RX concepts to other medical specialities\bigskip Integrate multiple models\bigskip{\subtitlefont Long Range}\medskip Use machine-processable knowledge in medicine?\bigskip Use models to share scientific knowledge\medskip to replace papers?\medskip to aid the review process\vfill\title{Transfer to other areas}\medskip Prerequisites\medskip Need to learn from data\medskip Reliable and deep data bases\bigskipOther Application areas\medskip Economic models (conditions for up/down swings)\medskip Personnel Management (conditions for good/poor performance, re-enlistment)\medskip Monitoring of Production lines (conditions for product failures)\vfill

Page 58: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

Rx5. General Conclusions

\subtitle{ drawn from RX}\smallskip Hypothesis Generation traditionally a scientists' function Inspiration derived from experience (Database contains more experience than any physician or expert!)\bigskip Hypothesis Validatation (equivalent to a very smart query) AI provides a powerful control tool processing driven by knowledge\bigskip Inserting Validated Hypothesis back into knowledge base\smallskip\centerline{\Boxit{Learning} from Data!}\bigskip Knowledge needs Structure for interpretation for sharing for maintenance

Page 59: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxAI as the Control Mechanism

AI deals with knowledge about general object types\smallskipDatabases deal with data, facts about specific objects\bigskipUse AI as the control mechanism for large programs\bigskipA large program has 1. Computational sections 2. Data organizing sections 3. Controls to decide what sections to invoke in sequence iteratively and set parameters

Page 60: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxControl using AI technology

Control is based on general concepts\smallskip\item{1} Iteration has converged\item{2} Goal has been reached\item{3} Computation got stuck\item{4} . . .\smallskipNumeric programs use FORTRAN IF statements for controlStatistical packages use control `cards' for controlData-processing programs use COBOL conditions for controlReal-time programs use interrupts for control\smallskipWe believe that CONTROL BY AI IS BETTER\title{Advantages of AI}Control statements are based on a variety of conditions\smallskipif these conditions are only testable sequentially (a fixed decision tree) then all conditions have to be known or the decision tree gets very complex\bigskipRules in AI systems do not depend on evaluation order for correctness.\bigskipHERE: AI to serve applications, not for its own sake.

Page 61: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxKnowledge Base

\centerline{\lower5pt\Boxit{Literature/Education}~+~\lower5pt\Boxit{Data}\ +\ \lower5pt\Boxit{Experience}\hfil}

\centerline{\DOWNARROW}

\centerline{\Boxit{Knowledge}}

\centerline{\DOWNARROW}

\centerline{\lower5pt\Boxit{Consultation}\ +\ \lower5pt\Boxit{Surveillance}\ +\hfil}

\smallskip

\centerline{\lower5pt\Boxit{Teaching}\ +\ \lower5pt\Boxit{Research}\hfil}

Page 62: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

Rx6. Future Research Problems

Some addressed by Current Research in KBMSLong Range Research Focus\smallskip Develop and Validate Concepts for Information systems that store\smallskip 1. factual data as well as 2. knowledge about the dataand support inferencing over the totality\smallskipApplications in Decision support Planning Design support\vfillCurrent State Little Internal Structure meta knowledge, control rules, facts ...\smallskip Lack of Generality / Sharability tight linkage to interpreter\bigskipOK while sizes are modest

Page 63: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxProblems foreseen

\bigskipPerformance \DOWNARROW === Size \UPARROW\bigskipMaintenance cost \UPARROW === Coverage \UPARROW\bigskipConsistency \DOWNARROW === Breadth \UPARROW\bigskipSharability of Knowledge 0 === Amortization of Cost 0\vfillSpecifically impossible to maintain by multiple experts\bigskip single objective -- lack of knowledge reuse drives up cost

Page 64: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxQuestions:

\smallskipHow large are large KB's really? and how will they grow?\smallskipseveral dozen rules -- interestingseveral hundred rules -- largeseveral thousand rules -- rare, major\smallskipWhat Is Proportion of\in$\bullet$ Ground Rules or Facts . . . \UPARROW\in$\bullet$ Application knowledge . . . . \UPRIGHTARROW\in$\bullet$ Control knowledge . . . . . . . \RIGHTARROW\smallskip\rightline{what can we expect in the future}\vfillCan KB's serve multiple objectives?\smallskip\table{\qquad System & \qquad size\crMYCIN\quad &100's of rules, one interpreter\crXCON\quad &large, $>$10,000 small OPS rules\crXSEL\quad &also large, and large overlap but distinct\crR1ME\quad &towards algorithmic approach\crRX\quad &few hundred frames, 30MB data,\cr &several interpreters\crRADIX\quad &generate and test interpreters, new KB\cr}

Page 65: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxKnowledge Sharing

Difficult\quad RADIX did NOT use RX knowledge as represented in RX\smallskip\quad XCON and XSEL (although similar) do not share the knowledge representation \medskipThe Interpreter and the Knowledge \BULLET are interlinked \BULLET must be interlinked so that sharing is infeasible?\bigskipDatabases are a means for SHARING data for diverse tasks\smallskipWhy? Consistency Cost reduction\medskip\Boxit{Adopt Similar Paradigm for Knowledge}

Page 66: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxPartitioning

For maintenance and control not to limit inference\smallskip\line{Horizontal: \hfill \Boxit{Knowledge} \hfill }\line{ \hfill \Boxit{Database} \hfill }\bigskip\line{Vertical: \Boxit{Domain 1}\hskip-9.5pt\raise1pt\Boxit{\

vrule height 12pt width 0pt depth 4pt Domain 2}\hskip-20pt\lower2pt\Boxit{\vrule height 14pt width 0pt\hskip30pt}\hskip-20pt\Boxit{Domain n} \hfill }

\bigskipObjects (Frames) can be in several domains\smallskip\line{ \Boxit{Faculty \RIGHTARROW Smith}\hskip-29.5pt\

raise1pt\Boxit{\hskip 28pt \LEFTARROW Consultant} \hfill}

Page 67: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxDomain Characteristics

\table{ Structure&& interpretation\cr\hrulefill& &\hrulefill\cris-a&&inheritance\crpart-of&&synchrony, ownership\crderivation&boolean&dependency\cr &rule&pickup locally\cr &proc.&exec. embedded proc\crcompleteness& & negation\cr& &\quad univ.quantification\cruniqueness&&first result\crdisjointness&&sum = total\cr\hrulefill& &\hrulefill\cr}\vfillRestrict an interpreter to operate on knowledge with identical characteristcs during one [sub-] task

Page 68: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxPlanned Organization

Boxit{KSYS}

\table{Defined Frames &\RIGHTARROW & object types \cr

Instance Frames &\RIGHTARROW & selected data objects \cr

Slots&\RIGHTARROW &Attributes \cr

\hfill associated& with & Subset: SoD \cr

SoD & \RIGHTARROW & Features\cr

Feature&\RIGHTARROW & Interpretable Description\cr

Description &\RIGHTARROW & defines type of SoD \cr

SoD &\RIGHTARROW & Subset of discourse\cr

Active SoD & \RIGHTARROW & Scope of Interpreter \cr

&\hfil best: & Hierarchy\cr

\hfill instantiated& via & DBMS View\cr}

Page 69: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxWork in Progress

Modeling of SoD Interaction by theories

Definition of features of SoD

Definition of operations within SoD

Instantiation of values within SoD from DBMS

\bigskip

\subtitle{Expectations}

Conceptual Linkage of KNOWLEDGE and DATA

will be more effective than

physical access linkage of

\smallskip

EXPERT SYSTEMS and DBMS

Page 70: R x Acquisition of Knowledge from a Database Gio Wiederhold, Ph.D. 1 Robert L. Blum, M.D., Ph.D. 2 Michael Walker 3 Departments of Medicine (1,3) and Computer.

RxFundamental CS Issues

Representation of Knowledge Management of Knowledge Exploitation of Knowledge Representation of Heuristics Use of AI as a Control Mechanism Multi-step Planning Use of Database for Planning\smallskipalso\smallskip Dealing with Uncertain data Data that are distributed on Autonomous systems Dealing with Data that do not match syntactically {\bf do} match semantically Non-monotonic updates that imply (eventual) model changes