Top Banner
Normalization and Lossless Join Decomposition of Similarity-Based Fuzzy Relational Databases Özgün Bahar, Adnan Yazıcı* Department of Computer Engineering, Middle East Technical University, 06531, Ankara, Turkey Fuzzy relational database models generalize the classical relational database model by allowing uncertain and imprecise information to be represented and manipulated. In this article, we intro- duce fuzzy extensions of the normal forms for the similarity-based fuzzy relational database model. Within this framework of fuzzy data representation, similarity, conformance of tuples, the concept of fuzzy functional dependencies, and partial fuzzy functional dependencies are utilized to define the fuzzy key notion, transitive closures, and the fuzzy normal forms. Algo- rithms for dependency preserving and lossless join decompositions of fuzzy relations are also given. We include examples to show how normalization, dependency preserving, and lossless join decomposition based on the fuzzy functional dependencies of fuzzy relation are done and applied to some real-life applications. © 2004 Wiley Periodicals, Inc. 1. INTRODUCTION The relational data model proposed by Codd 1 is based on the set of theoretic concepts and enables well-defined, unambiguous, and exact data of an applica- tion. However, in many real world applications, such as biology and genetics, geo- graphical information systems, economic and weather forecasting systems, and so on, data is often partially known or imprecise and queries may include vague terms. To cope with various types of imperfectness and to capture more meaning of the data in databases, several extensions to the classical relational database model have been proposed in literature. 1–8 Properly formulating a database model in terms of relation schemas is a key requirement in a fuzzy database design. Main frame- works for fuzzy data representation based on the fuzzy set theory 9 allow imprecise data for the attribute values and may be categorized into a partial membership- based approach, 5,6 similarity-based approach, 10 possibility-based approach, 11 and *Author to whom all correspondence should be addressed: e-mail: [email protected]. e-mail: [email protected]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, VOL. 19, 885–917 (2004) © 2004 Wiley Periodicals, Inc. Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ int.20029
33

[2004] Normalization and Lossless Join

Jul 20, 2016

Download

Documents

thuan_nv

Hello World
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: [2004] Normalization and Lossless Join

Normalization and Lossless JoinDecomposition of Similarity-BasedFuzzy Relational DatabasesÖzgün Bahar,† Adnan Yazıcı*Department of Computer Engineering, Middle East Technical University,06531, Ankara, Turkey

Fuzzy relational database models generalize the classical relational database model by allowinguncertain and imprecise information to be represented and manipulated. In this article, we intro-duce fuzzy extensions of the normal forms for the similarity-based fuzzy relational databasemodel. Within this framework of fuzzy data representation, similarity, conformance of tuples,the concept of fuzzy functional dependencies, and partial fuzzy functional dependencies areutilized to define the fuzzy key notion, transitive closures, and the fuzzy normal forms. Algo-rithms for dependency preserving and lossless join decompositions of fuzzy relations are alsogiven. We include examples to show how normalization, dependency preserving, and losslessjoin decomposition based on the fuzzy functional dependencies of fuzzy relation are done andapplied to some real-life applications. © 2004 Wiley Periodicals, Inc.

1. INTRODUCTION

The relational data model proposed by Codd1 is based on the set of theoreticconcepts and enables well-defined, unambiguous, and exact data of an applica-tion. However, in many real world applications, such as biology and genetics, geo-graphical information systems, economic and weather forecasting systems, and soon, data is often partially known or imprecise and queries may include vague terms.To cope with various types of imperfectness and to capture more meaning of thedata in databases, several extensions to the classical relational database model havebeen proposed in literature.1–8 Properly formulating a database model in terms ofrelation schemas is a key requirement in a fuzzy database design. Main frame-works for fuzzy data representation based on the fuzzy set theory9 allow imprecisedata for the attribute values and may be categorized into a partial membership-based approach,5,6 similarity-based approach,10 possibility-based approach,11 and

*Author to whom all correspondence should be addressed: e-mail: [email protected].†e-mail: [email protected].

INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, VOL. 19, 885–917 (2004)© 2004 Wiley Periodicals, Inc. Published online in Wiley InterScience(www.interscience.wiley.com). • DOI 10.1002/int.20029

Page 2: [2004] Normalization and Lossless Join

the extended possibility-based approach.12 The similarity-based framework is theapproach used in this study.

One of the primary purposes of any database is to decrease data redundancyand to provide data reliability.13,14 Data redundancies and update anomalies havealso been of great concern in fuzzy relational database design,2,6,15,16 and integrityconstraints play an important role in fuzzy relational database design theory. Var-ious types of data dependencies such as functional and multivalued dependenciesare used as guidelines for the design of classical relational schema that are concep-tually meaningful and free of certain anomalies. For example, if one attribute deter-mines another, we say that there exists a functional dependency between theseattributes. This determination is unique in a classical (crisp) relational modelwhereas it need not be in a fuzzy relational database model. In a crisp databasemodel, functional and multivalued dependencies are the precise determinants, andthis is not the case for some of the real-world applications.17 As the relational datamodel is extended to deal with fuzzy data, integrity constraints have also beenextended, and, in literature, there are a number of ways to impose fuzzy data depen-dency on fuzzy data in fuzzy database relations.6,11,15,16,18–20,21 The following isan example of fuzzy functional dependencies (ffds).

One of the areas in which fuzziness may be used is business and financeapplications. To evaluate the creditworthiness of a customer, multiple financialand personal factors are used. Economic thinking and social integrity are two ofthese personal factors for the creditworthiness assessment for consumer credit.“Economic thinking and conformity with social and economic standards more orless determine the business behaviour” is a valid constraint in this application. Inthis example, all the “business behaviour,” “economic thinking,” and “social integ-rity” are the attributes of a person with inexact values. The “more or less” part inthe example causes the constraint itself to be fuzzy. The dependency does not deter-mine the precise level of determinancy, but the minimum level. The data depen-dency in this example application is the ffd, and such a dependency cannot beenforced by a crisp relational database system.

There have been a number of studies on extending data dependencies forfuzzy relational database models. Among these, Raju and Majumdar6 have pro-posed ffds in terms of a membership function of the elements of a fuzzy relation.Chen et al.2 have given a definition of ffds in terms of closeness measures (�)for the equality of possibility distribution and fuzzy implication operators. Shenoiet al.15 have extended Buckles and Petry’s approach10 by defining ffds based onequivalence classes from finite domain partitions alone. Liu16 has defined ffdsbased on the concept of semantic proximity in [0,1] between two fuzzy attri-bute values v1 and v2, which are intervals. Yazıcı8 and Yazıcı and Sözat17,20

have defined ffds between two fuzzy attribute values and proved the soundnessand completeness of the inference rules of those ffds. The studies related to thenormalization process, analyzing the given relation schemas to achieve the desir-able properties of minimizing redundancy, and minimizing the insertion, dele-tion, and update anomalies take the fuzzy data and ffds into account.2,6,15 Inthese studies, the main goal is that a fuzzy relation not being in a certain normalform is decomposed into multiple fuzzy relation schemas of the desired normal

886 BAHAR AND YAZICI

Page 3: [2004] Normalization and Lossless Join

forms. Dependency preserving and lossless join decompositions are used to achievedesirable decompositions. Chen et al.22,23 and Raju and Majumdar6 have studiedsuch decompositions for the fuzzy relational databases.

Our study differs from the previous research efforts in literature in a numberof aspects. First of all, the similarity-based fuzzy relational database model is usedas the reference model in our study. We deal with a number of issues to design thesimilarity-based fuzzy relational databases in order to reduce data redundancy andeliminate update anomalies. The formal definitions of ffd and partial ffd are givenbased on the conformance of tuples. In addition, the fuzzy key concept and transi-tive closure of the ffds are presented for definitions of the fuzzy normal forms.Second, we introduce a number of fuzzy normal forms based on the ffds. We firstdefine the fuzzy first normal form (1NF). Afterward, fuzzy second (2NF), fuzzythird (3NF), and fuzzy Boyce Codd Normal (BCNF) forms are introduced. Third,the dependency preserving and lossless join properties in decompositions into thefuzzy normal forms with respect to ffds are defined. Finally, all these concepts aredescribed along with examples to demonstrate how these concepts are used insome real-life applications.

The article is organized as follows: The following section discusses somebackground information, including fuzzy relational databases, similarity relations,and similarity-based fuzzy relational databases. In Section 3, fuzzy functionaldependencies (ffds), tuple conformance, inference rules for ffds, four fuzzy nor-mal forms, namely, fuzzy 1NF, fuzzy 2NF, fuzzy 3NF, and fuzzy BCNF forms,and their decomposition algorithms are provided. In this section two testing algo-rithms for the dependency preserving and lossless join properties of the decompo-sitions are given. We also include a real-life application, fraud detection, to showhow normalization and dependency preserving and lossless join properties of thesimilarity-based fuzzy relational databases are utilized. Finally, the conclusion isgiven in Section 4.

2. BACKGROUND

In this section, we first define fuzzy relational databases. Then, the similarityrelations are described as defined by Zadeh9 and the similarity-based fuzzy data-base model, as the reference model of this study, is briefly explained.

2.1. Fuzzy Relational Databases

The relational data model uses a single concept of relation both for data rep-resentation and data association, and it is supported by the set theory. In this model,every value in the relation is atomic; that is, values must be atomic. Except for thenull value, every attribute must have a precise value and cannot have fuzzy oruncertain values.

Several approaches are proposed for extending classical relational databasemodel to fuzzy relational database model. Fuzzy relational databases are the data-bases that can represent fuzzy and uncertain data. An extensive list of references

NORMALIZATION OF FRDBs 887

Page 4: [2004] Normalization and Lossless Join

to the relevant literature can be found in Refs. 5 and 24. Main contributions in thisarea are as follows: In the fuzzy relational data model proposed by Umano andFreedom,7 fuzzy data are represented by possibility distributions and a grade ofmembership is used to represent the association between values. Also this grade ofmembership may itself be a possibility distribution. Buckles and Petry10 intro-duced the fuzzy similarity relations. These fuzzy similarity relations facilitate theestimation of the extent to which possible values of an attribute can be regarded asbeing interchangeable. Prade and Testemale11 generalized the representation ofUmano by introducing an extra element, e, for the situations where a nonzero pos-sibility can mean the nonapplicability of an attribute. They also proposed the useof possibility distributions to represent fuzzy values as well as uncertainty con-cerning the value of an attribute. To handle incomplete information and missingand nonapplicable values, Imelinski and Lipski3 proposed a method where incom-plete information is represented as a list of possible values. Lipski does not assumethat null means a value that is completely unknown.

Two different causes of imprecise attribute values in database systems moti-vated two approaches for representing fuzzy data. The similarity-based approach10

uses linguistic terms to describe attribute values. The impreciseness of these termsis characterized by a similarity matrix, which records the degree of similaritybetween the pairs of linguistic terms in a domain. The possibility-based model isan alternative approach for representing imprecise data using a possibility distri-bution as the value of an attribute. Possibility measure and necessity measure arethe two kinds of matching degrees calculated for this approach. There have alsobeen some mixed models combining these approaches.12,19

2.2. Similarity Relations

The identity relation used in nonfuzzy relational databases induces equiva-lence classes over a domain base set, Dj , which affects the result of certain opera-tions and the removal of redundant tuples. The equivalence classes are mostfrequently singleton sets. Identity relation is a special case of this similarity relation.

Similarity relations are useful for describing how similar two elements fromthe same domain are. A similarity relation,9,10 s~x, y!, for a given domain Dj , is amapping of every pair of elements in the domain onto the unit interval [0,1]. Asimilarity relation is reflexive and symmetric as in an equivalence relation. Thesimilarity relation should also have transitive property. These three properties of asimilarity relation are stated below.

Definition. A similarity relation is a mapping, s: D � D r @0, 1# , such thatfor x, y, z � D,

s~x, x! � 1 ~reflexivity!,

s~x, y!� s~ y, x! ~symmetry!,

s~x, z!� maxy�D~min~s~x, y!, s~ y, z!!! ~max-min transitivity!

888 BAHAR AND YAZICI

Page 5: [2004] Normalization and Lossless Join

2.3. Similarity-Based Fuzzy Relational Databases

The similarity-based fuzzy relational model is not an extension to the originalrelational model, but actually a generalization of it. It allows a set of values for anattribute rather than only atomic values, and replaces the identity concept with asimilarity concept.

The similarity-based relational model allows a set of values for a singleattribute provided that all the values are from the same domain. Thus, while allow-ing multiple values, similarity-based relational model keeps the strongly typedattribute value property of the original model. This property is useful for queryprocessing and update operations. If the attribute value is precise and crisp, thenthe value is atomic, if it is imprecise and inexact, then a set of values that aresimilar to this value are stated in place of it. The level of similarity among thevalues is defined by the explicitly defined similarity relation for the domain of theattribute values.

The original model compares two attribute values by checking whether thetwo values are equal or not. The identity relation reflects this fact: i ~x, y! � 1 ifand only if x � y, and i ~x, y! � 0 otherwise. The similarity-based relationalmodel10 compares two attributes by measuring the closeness of the values in termsof the explicitly declared similarity relation of the attribute domain. A tuple in thismodel is called redundant if it can be merged with another through the set union ofcorresponding domain values.

3. FUZZY NORMAL FORMS FOR FUZZY RELATIONS

In a logical database design, integrity constraints have a critical role. One ofthe most important integrity constraints is the functional dependency. Functionaldependencies reflect a kind of semantic knowledge about the relationships betweenthe attributes. They help the database designer remove some of the redundant infor-mation in the relations. To provide a guidance for a good fuzzy database designseveral fuzzy normal forms based on fuzzy functional dependencies have beenproposed.

3.1. Fuzzy Functional Dependencies

Fuzzy functional dependencies reflect some kind of semantic knowledge aboutattribute subsets of the real world. Ffds are used to design fuzzy databases wheredata redundancy and update anomalies are reduced.

In the classical relational data model, a functional dependency Xr Y statesthat equal Y values correspond to equal X values. However, the definition of func-tional dependency is not directly applicable to similarity-based fuzzy databases,because the concept of equality does not totally apply to fuzzy relational databasemodels. In a fuzzy relational data model, the degree of “X determines Y ” may notnecessarily be 1 as in the crisp case. Naturally, a value ranging over the interval[0,1] may be accepted. Then the definition of ffd turns into “similar Y values cor-respond to similar X values.”

NORMALIZATION OF FRDBs 889

Page 6: [2004] Normalization and Lossless Join

Ffds are functional constraints that are specified among the attributes of afuzzy relation schema. In the definition of the ffds, we use the conformance con-cept.8,17,20 According to the definition of conformance, a tuple is similar to itselfindependent of its attribute values, the uncertainty is kept even in the presence offfds imposed on the relation, and this definition of conformance is transitive, sym-metric, and reflexive. For precise ffds, the similarity of Y values has to be greaterthan or equal to the similarity of X values, where similarity is measured in terms ofconformance. For imprecise ffds, the impreciseness of the dependency is a thresh-old on the similarity of Y values, weakening the dependency. Using the definitionof ffd, we have defined the partial ffd, to be used in the definition of fuzzy 2NF.

3.1.1. Conformance of Tuples

A ffd can be represented as Xrq Y, where q is the linguistic strength (like“more or less,” “sometimes,” etc.). A ffd, X rq Y, states that similar Y valuescorrespond to similar X values. Here similarity (or closeness) refers to confor-mance of tuples. The similarities of the attribute values define how conformant thetwo tuples are on that attribute. A formal definition of conformance7 is given below.

Definition. The conformance of attribute Ak defined on domain Dk for any twotuples t1 and t2 present in relation instance r and denoted by C~Ak @t1, t2 # ! isgiven as

C~Ak @t1, t2 # ! � min$minx�d1$maxy�d2

$s~x, y!%%, minx�d2$maxy�d1

$s~x, y!%%%

where d1 is the value set of attribute Ak for tuple t1, d2 is the value set of attributeAk for tuple t2, s~x, y! is a similarity relation for values x and y, and s is amapping of every pair of elements in the domain Dk onto interval @0, 1# .

In the case of an ordinary relational data model, both d1 for d2 have to besingleton sets, and the similarity of any tuples can have the value of either 0 or 1.Here, the identity relation is replaced by the explicitly declared s~x, y! of whichthe identity relation is a special case. To describe the closeness of two tuples on aset of attributes rather than on a single attribute, the definition of conformance isextended in Ref. 8 as follows.

Definition. The conformance of attribute set X for any two tuples t1 and t2

present in relation instance r and denoted by C~X @t1, t2#! is given as C~X @t1, t2#!�minAk�X $C~Ak @t1, t2 # !%.

3.1.2. Definition of Fuzzy Functional Dependencies

A formal definition for the ffd can be given as follows.

Definition. Let r be any fuzzy relation instance on schema R~A1, . . . , An ! , U bethe universal set of attributes A1, . . . , An, and both X and Y be subsets of U. Fuzzy

890 BAHAR AND YAZICI

Page 7: [2004] Normalization and Lossless Join

relation instance r is said to satisfy the ffd, X rq Y, if for every pair of tuples t1

and t2 in r, C~Y @t1, t2 # ! � min~q, C~X @t1, t2 # !! , where q is a real numberwithin the range @0, 1# , describing the linguistic strength.

As for their crisp counterparts, the ffds should also be checked whenevertuples are inserted into the fuzzy relational database or they are modified, so thatthe integrity constraints imposed by these ffds are not violated.

Example 1. Consider a fuzzy relation instance Person � ~Name, Performance,Earning!. The similarity relations of the attribute domains are given in Tables I–III.

The integrity constraint for the “Person” relation is “Performance of theemployee more or less determines his/her earning.” That is, the ffd of this relationis PERFORMANCEr0.6 EARNING, where 0.6 is the linguistic strength, “moreor less.” This ffd should be checked whenever new tuples are to be inserted, to seewhether the new tuple violates the ffd. Below, a couple of tuples are inserted toinvestigate the tuple conformance concept.

Step 1: Insertion of the first tuple

^$Kelly%, $ poor, very poor%, $little%&

Since this is the first tuple, it does not violate the ffd.

Step 2: Insertion of the second tuple

^$Matthew%, $average%, $moderate, average%&

Table I. Similarity relation for attribute NAME.

NAME Kelly Jerry Matthew Sandra

Kelly 1 0 0 0Jerry 0 1 0 0Matthew 0 0 1 0Sandra 0 0 0 1

Table II. Similarity relation for attribute PERFORMANCE.

PERFORMANCE Very poor Poor Average Good Excellent

Very poor 1 0.75 0.3 0.3 0.3Poor 0.75 1 0.3 0.3 0.3Average 0.3 0.3 1 0.6 0.6Good 0.3 0.3 0.6 1 0.65Excellent 0.3 0.3 0.6 0.65 1

NORMALIZATION OF FRDBs 891

Page 8: [2004] Normalization and Lossless Join

The conformance values of the left- and right-hand side attributes of the ffd are as

C~Perf @t1, t2 # ! � 0.3, C~Earn@t1, t2 # !� 0.2

Here, the ffd Performance r0.6 Earning is violated because C~Earn@t1, t2 # ! �min~0.6, C~Perf @t1, t2 # !!, so the tuple is not inserted.

Step 3: Insertion of the third tuple

^$Jerry%, $average, good %, $moderate%&

There is only one tuple to be dealt with for the conformance check, because thetuple in step 2 is not inserted.

C~Perf @t1, t2 # ! � 0.3, C~Earn@t1, t2 # !� 0.8

Then the ffd Performancer0.6 Earning is not violated because C~Earn@t1, t2# !�min~0.6, C~Perf @t1, t2 # !!, so the tuple is inserted. Now, we have two tuples inour relation:

t1: ^$Kelly%, $ poor, very poor%, $little%&

t2: ^$Jerry%, $average, good %, $moderate%&

Step 4: Insertion of the fourth tuple

^$Sandra%, $average%, $little%&

There are two tuples to be dealt with for the conformance check, because the tuplein step 2 is not inserted.

C~Perf @t1, t3 # ! � 0.3, C~Earn@t1, t3 # !� 1,

C~Perf @t2, t3 # !� 0.6, C~Earn@t2, t3 # !� 0.8

Then the ffd Performance r0.6 Earning is not violated because both

C~Earn@t1, t3 # !� min~0.6, C~Perf @t1, t3 # !!

and

C~Earn@t2, t3 # !� min~0.6, C~Perf @t2, t3 # !!

Table III. Similarity relation for attribute EARNING.

EARNING Little Moderate Average High Very high

Little 1 0.8 0.2 0.2 0.2Moderate 0.8 1 0.2 0.2 0.2Average 0.2 0.2 1 0.6 0.6High 0.2 0.2 0.6 1 0.8Very high 0.2 0.2 0.6 0.8 1

892 BAHAR AND YAZICI

Page 9: [2004] Normalization and Lossless Join

so the tuple is inserted. Thus, we have three tuples in the relation:

t1: ^$Kelly%, $ poor, very poor%, $little%&

t2: ^$Jerry%, $average, good %, $moderate%&

t3: ^$Sandra%, $average%, $little%&

3.1.2.1. Partial Fuzzy Functional Dependencies. Using the definition of theffd, we can define a partial ffd, which is used in the definition of the fuzzy 2NF.

Definition. Y is called partially fuzzy functionally dependent on X to thedegree q, X rq Y partially, if and only if X rq Y and there exists an X ' � X,X ' � �, such that X ' ra Y where a � q.

In more relaxed terms, a ffd Xrq Y is a partial ffd, if removal of an attributeA from X means that the dependency still holds. That is, for an attribute A � X,X � $A% still fuzzy functionally determines Y to the degree a � q.

Example 2. Let the relational schema be R � ~A, B, C! and the ffds beAB r0.8 C and A r0.9 C. After removing attribute B from the first ffd, thedependency still holds; hence AB r0.8 C is the partial ffd.

3.1.3. Inference Rules for Fuzzy Functional Dependencies

An important concept related to data dependencies is the concept of infer-ence rules. Given a set of dependencies, inference rules introduce other dependen-cies that are logical consequences of the given dependencies. These rules aredependency generators and so they are closely related to the definition and seman-tics of the dependencies.

The fuzzy inference rules are listed below for ffds. They reduce to those ofthe classic fds. The inference rules presented below have already been shown to besound and complete in Ref. 17.

(1) Inclusive rule for ffds:

If Xru1 Y holds and u1 � u2, then Xru2 Y holds.

(2) Reflexive rules for ffds:

If X � Y, then Xru Y holds for all u � @0, 1# .

(3) Augmentation rule for ffds:

Whenever r satisfies Xru Y, it also satisfies XZru YZ

(4) Transitivity rule for ffds:

Whenever r satisfies Xru1 Y and Yru2 Z, it also satisfies Xrmin~u1,u2 ! Z

NORMALIZATION OF FRDBs 893

Page 10: [2004] Normalization and Lossless Join

By successive application of the above inference rules, additional inference rulesfor the ffds can be stated:

(1) Union rule for ffds:

Whenever Xru1 Y and Xru2 Z are satisfied by r, Xrmin~u1,u2 ! YZ is also satisfied

(2) Pseudotransitivity rule for ffds:

Whenever r satisfies Xru1 Y and WYru2 Z, then it also satisfies WXrmin~u1,u2 ! Z

(3) Decomposition rule for ffds:

If Xru Y holds and Z � Y, then Xru Z holds

3.2. Fuzzy Keys

Like its classical relational counterpart, fuzzy normal forms are based on theconcept of ffd and the concept of fuzzy key. Therefore, we define the fuzzy keyconcept in this section.

A primary key is a special case of functional dependency in classical rela-tional database models. The role of X in functional dependency Xr Y belongs tothe attributes in the key, and the set of all other attributes in the relation play therole of Y. That is, a key subset of U, K, of a relation schema R means that thevalues of U are determined from K values for all tuples of any relation of R. Inclassical relational data model, identical K values lead to identical U values. In thefuzzy relational data model, the concept of being identical again leaves its place tosimilarity (or closeness). The determination is reflected by the relationship thatidentical K values lead to identical U values, and close K values lead to close Uvalues to a certain extent. In fuzzy relational databases, the classical primary keyis extended to be called fuzzy key with strength q, where q is the extent men-tioned before. A more formal definition can be given as follows.

Definition. Let K, S � U, and F be a set of ffds for R: K is called a fuzzy key ofR with strength q if and only if K rqi

U � F and K rqiU is not a partial ffd,

where q� minqi and q � 0.

Example 3. If we consider a symbolic example, let us have a relation R whereR � ~A, B, C, D! and ffds A r0.7 B and A r0.9 CD; the A is called the fuzzykey of the relation with strength 0.7, because B values are determined by A to thedegree 0.7, and C and D values are determined by A to the degree 0.9. Our qi

values are q1 � 0.7 and q2 � 0.9, and q value is then the minimum of $0.7, 0.9%,that is, 0.7.

A fuzzy key can have the values that an ordinary attribute can take. It canhave multivalues such as $a, b% where a and b are similar to each other with acertain degree. The only restriction on the values of a fuzzy key, like the values of

894 BAHAR AND YAZICI

Page 11: [2004] Normalization and Lossless Join

other attributes, is that the values should not be AND-combined, as will be explainedlater.

3.2.1. Transitive Closure of Fuzzy Functional Dependencies

Given a set of ffds for a relation, the fuzzy key of that relation can be foundutilizing the concept of transitive closure. Chen, Kerre, and Vandenbulcke18 stud-ied the ffd transitive closure and axiomatization of fuzzy functional dependence.Transitive closure comes into place when we want to know whether a given ffdcan be derived using the ffd set F of a relation and the inference rules for ffds.However, it is not a simple task to compute the set of all ffds that are derived fromF using the inference rules, because the set is infinite. Instead of computing thiswhole set, the algorithm below finds all attributes that are fuzzy functionally depen-dent on attribute(s) X, and the maximal degree the dependencies hold, namely thetransitive closure of X.

Algorithm. Transitive Closure Computation Algorithm. Let X be a set of kattributes, X � X1 X2 . . . Xk:

(1) Initially construct the closure list of X, XList, with the attributes in X with the maximaldegree, 1, for each.

XList � $~X1,1!, ~X2,1!, . . . ,~Xk ,1!%

The domain, Dom, contains the attributes in the XList; X1, X2 , . . . , Xk initially. BListis a temporary closure list, and initialized at the beginning.

(2) For each ffd V ra W, in F.

If the left-hand side of the ffd is a subset of the domain, V � Dom,

• Find the minimum strength in XList, among the elements of XList whose attributesare the elements of V, minstrength.

• Set f as the minimum of a and the strength found in the previous step, f� min~a,minstrength!

• For each attribute Wj of the right-hand side W, add the entry ~Wj , f! to the BList.

(3) Combine BList into XList using fuzzy union operation.(4) If there is a change in XList, reset the BList, adjust the domain, Dom, according to the

new elements of XList, and go to step 2. Else stop, XList is the transitive closure of X.

Example 4. If we consider the relation in Example 3, the relation R has theattribute set $A, B, C, D% and ffds A r0.7 B and A r0.9 CD. Let us compute thetransitive closure of attribute A.

Initially,

XList � $~A, 1!%, Dom � $A%, BList � �

For the first ffd, A r0.7 B

Minstrength � 1, w� min~1, 0.7!� 0.7

BList � $~B, 0.7!%

NORMALIZATION OF FRDBs 895

Hop
Highlight
Hop
Highlight
Hop
Highlight
Hop
Highlight
Hop
Highlight
Hop
Highlight
Hop
Highlight
Page 12: [2004] Normalization and Lossless Join

For the second ffd, A r0.9 CD

Minstrength � 1, w� min~1, 0.9!� 0.9

BList � $~B, 0.7!, ~C, 0.9!, ~D, 0.9!%

Combining BList into XList, XList � $~A, 1!, ~B, 0.7!, ~C, 0.9!, ~D, 0.9!%.Because XList is changed, we reset BList and our new domain is Dom � $A, B,C, D%. And then the two ffds should again be considered in the same scenario.But this time, there is no change in XList, and hence the transitive closure of A is$~A, 1!, ~B, 0.7!, ~C, 0.9!, ~D, 0.9!%.

3.2.2. Finding the Fuzzy Key of a Relation

To find the fuzzy key of a relation, the concept of transitive closure for ffds isused. The exhaustive way is to analyze the transitive closure of all the combina-tions of all of the attributes in the relation and check whether the transitive clo-sures found include all the attributes. This means that the attribute combinationdetermines all the attributes in the relation to the respective degrees in the closurelist, and the minimum of these strength values would be the strength of the fuzzykey. But in this case, there is no need to consider the transitive closures of allattributes, because for an attribute to be a part of a fuzzy key, it should belong tothe left-hand side of any of the ffds, or it should not exist in any of the ffds in therelation. That is, to find a fuzzy key, the attributes that appears only on the right-hand sides of the ffds in the relation need not be considered in finding the transi-tive closures. Below is the algorithm to find the fuzzy keys of a given relation witha set of ffds F.

Algorithm. Fuzzy Key Finding Algorithm. Let F be the set of ffds of R:

(1) Find all the left-hand side attributes of ffds in F.(2) Find the attributes not contained in any of the ffds of F.(3) Get the union of the two sets found in the first two steps above into AttributeList.(4) Beginning with the single attribute combinations, for all the ascending combinations

of attributes in AttributeList (say comb for the combination):• If comb contains a key found before, continue with another combination.• Find the transitive closure of the comb.• If the transitive closure found contains all the attributes of the relation, set a to the

minimum of the strengths in the transitive closure, and add comb to the key list withthe degree a.

With this algorithm, all the candidate keys can be found. The first control of thefourth step in the algorithm ensures the full fuzzy functional dependence of theattributes of the relation on the fuzzy key.

Example 5. Let us consider Example 3 again. To find all the fuzzy keys of therelation R � ~A, B, C, D! with ffds A r0.7 B and A r0.9 CD, we apply thealgorithm above. The set of left-hand-side attributes of R is $A%. There is noattribute not contained in any of the ffds, so AttributeList � $A%. Because there is

896 BAHAR AND YAZICI

Page 13: [2004] Normalization and Lossless Join

only one attribute in AttributeList, only one transitive closure set, that is for attributeA, should be computed. And the transitive closure of A is $~A, 1!, ~B, 0.7!, ~C, 0.9!,~D, 0.9!%. Because the transitive closure contains all the attributes in the relation,“A” is the fuzzy (candidate) key of the relation with strength 0.7, that is, a mini-mum of (1, 0.7, 0.9).

3.2.3. Fuzzy Prime and Nonprime Attributes

To be able to state the condition for the fuzzy 2NF, it is also necessary todefine fuzzy prime and fuzzy nonprime attributes for a relation.

Definition. Let A � U, X � U, and K be a fuzzy key set of R. A is called afuzzy prime attribute if and only if A � K; X is called a fuzzy prime if and only ifX � K. Those attributes that are not fuzzy prime are called fuzzy nonprime.

For an attribute to be a fuzzy prime attribute, it should be a part of at least oneof the fuzzy candidate keys of the relation. Similarly, for an attribute to be a fuzzynonprime attribute, it should not appear in any of the fuzzy candidate keys of therelation. In Example 5, the attribute A is a prime attribute with a degree of 0.7.

3.3. Fuzzy First Normal Form

The first one of the classical normal forms that is extended and generalizedwithin the framework of similarity-based fuzzy relational model is the 1NF.

Definition. Let Dk be the domain of attribute Ak , a relation schema R is calledto be in fuzzy 1NF if and only if for any relation r in R, none of the attributes hasvalues (AND-combined) multivalued.

When a relation schema is not in fuzzy 1NF, the algorithm below can be usedto normalize the relation to be in fuzzy 1NF.

Algorithm. Fuzzy 1NF Decomposition Algorithm.When the relation is not in fuzzy 1NF, remove the tuple whose attributes vio-

late fuzzy 1NF.Place these attributes in separate tuples along with the other attributes to

achieve the fuzzy 1NF.

Example 6. Consider a relation schema R, and let its attributes be NAME, AGE,and LANGUAGE-SPOKEN. A relation r of R consists of four tuples given as

t1 � ~Kelly, 35, English)

t2 � ~Jerry, [very young, young] , $English, French%)

t3 � ~Matthew, middle-aged, an oriental language)

t4 � ~Sandra, 60, German)

NORMALIZATION OF FRDBs 897

Page 14: [2004] Normalization and Lossless Join

In r, t1 means Kelly is 35 years old and speaks English, t2 means that Jerry, quiteyoung, speaks English and French, t3 means Matthew, who is middle-aged, speaksJapanese, and t4 means Sandra, aged 60, speaks German.

This schema does not satisfy fuzzy 1NF because of the second tuple. In thistuple, Jerry speaks two languages, and this is an example of multivalued (AND-combined) data. When we apply the algorithm to make the relation in fuzzy 1NF,the tuples become

t1 � ~Kelly, 35, English)

t2 ] t5 � ~Jerry, [very young, young] , English)

t6 � ~Jerry, [very young, young] , French)

t3 � ~Matthew, middle-aged, Japanese)

t4 � ~Sandra, 60, German)

where the relation is now in fuzzy 1NF.

3.4. Fuzzy Second Normal Form

The fuzzy second normal form, fuzzy 2NF, is based on the concept of the fullffd. By using the concepts of fuzzy key and partial fuzzy functional dependence,we can define the fuzzy 2NF.

Definition. Let F be the set of ffds for schema R and K be a fuzzy key of R withstrength q. R is called to be in fuzzy 2NF if and only if none of the fuzzy nonprimeattributes is partially fuzzy functionally dependent on the fuzzy key, K.

Example 7. Let us consider a symbolic example, where a relation schema is R �~A, B, C, D!, and the ffds are AB r0.8 D and A r0.9 C. Then attributes AB isthe fuzzy key with strength 0.8. Because a fuzzy nonprime attribute, C, is partiallyfuzzy functionally dependent on fuzzy key of R, AB, R is not in fuzzy 2NF.

3.4.1. Fuzzy Second Normal Form Control

Because the definition of fuzzy 2NF involves the control of partial ffd offuzzy nonprime attributes on the fuzzy key of R, an algorithm is used to controlpartial fuzzy functional dependence and it is given below.

Algorithm. Partial Dependency Control Algorithm. Let the ffd to be investi-gated for being partial be X ra Y.

(1) If the left-hand side of the ffd, X, contains a single attribute, the test need not be appliedat all; the ffd is not partial. Otherwise,

(2) Beginning with the single attribute combinations, for all the ascending combinationsof the attributes of X, except for the combination containing all the attributes;

898 BAHAR AND YAZICI

Page 15: [2004] Normalization and Lossless Join

• Find the transitive closure of the combination.• If the transitive closure contains all the attributes of the right-hand side of the ffd, Y,

and the corresponding strengths are greater than or equal to a, then the ffd is partial.

The algorithm above is based on the fact that, if a proper subset of left-hand sideattributes of a ffd fuzzy functionally determines the right-hand side to a degreegreater than or equal to the strength of the ffd, then the ffd is partial.

To understand whether a given relation is in its fuzzy 2NF, all the fuzzy non-prime attributes of the relation should be checked to see whether they are partiallyfuzzy functionally dependent on any of the fuzzy keys of the relation. The algo-rithm below is developed for the fuzzy 2NF control for a given relation.

Algorithm. Fuzzy 2NF Control Algorithm. Let K be the set of fuzzy keys of rela-tion R.

For each candidate key Ki of the relation,

• If the fuzzy key contains a single attribute, it has already no partial ffd, continue withanother candidate key.

• For each nonprime attribute Aj of the relation,• Let the ffd be Ki rai

Aj, where ai is the strength of Ki.• Apply the partial dependency control algorithm to find out whether the ffd is a partial

ffd. If so, stop, the relation is not in fuzzy 2NF.

3.4.2. Decomposition into Fuzzy Second Normal Form

If a relation schema is not in fuzzy 2NF, it can be normalized into a number ofsmaller relations in fuzzy 2NF by the following algorithm.

Algorithm. Fuzzy 2NF Decomposition Algorithm: If the relation is not in fuzzy2NF, using the Fuzzy 2NF control algorithm, find the partial fuzzy keys and theirdependent fuzzy nonprime attributes.

• Decompose and set up a new relation for each partial fuzzy key with its dependentattributes.

• Extract the fuzzy nonprime attributes that are partially fuzzy functionally dependent onany fuzzy key of the relation from the original relation and set up a new relation with theremaining attributes.

Example 8. If we consider Example 7 again, the relation was R � (A, B, C, D),ffds were ABr0.8 D and Ar0.9 C, and AB was the fuzzy key of the relation withstrength 0.8. The second ffd A r0.9 C contains a part of the fuzzy key as itsleft-hand side, so we have to decompose the relation. According to our algorithm,the decomposition will be like R1 � (A, C) and R2 � (A, B, D) where A is thefuzzy key of the first relation with strength 0.9 and AB is the fuzzy key of thesecond relation with strength 0.8.

NORMALIZATION OF FRDBs 899

Page 16: [2004] Normalization and Lossless Join

3.5. An Example Application: Leasing Risk Assessment

To automate the risk assessment evaluation for car leasing contracts, a fuzzyenhanced score card system is developed. There are three different customer types:private, self-employed, and corporate customers. For modeling private customers,factors such as age, marital status, length of time at present address,and so forthare used, that is, the attributes are generally crisp. On the other hand, corporatecustomers have more input variables that are a bit more complicated and containfuzzy data. Attributes of the relation Leasing Risk Assessment are (Capital, Rev-enue, Workforce, CompAge, LegalType, FinanBack, CompStruct, IlliquidRisk, Credit-Rating), where

Capital ] Company’s capital basisRevenue ] Company’s annual revenueWorkforce ] Number of employeesCompAge ] Age of the companyLegalType ] Legal status of the companyFinanBack ] Financial background evaluationCompStruct ] Company structure evaluationIlliquidRisk ] Evaluation of the risk of company becoming illiquidCreditRating ] Credit rating for the current leasing contract

with the ffds specified below:

FFD1: Company’s capital basis and annual revenue generally determines itsfinancial background.

$Capital, Revenue%r0.8 FinanBack

FFD2: Number of employees, age of the company and its legal status togethermore or less determines the structure of the company.

$WorkForce, Compage, LegalType%r0.7 CompStruct

FFD3: Financial background and structure of the company mostly deter-mines the risk of the company becoming illiquid.

$FinanBack, CompStruct%r0.9 IlliquidRisk

FFD4: Evaluation of the risk of the company becoming illiquid more or lessdetermines the credit rating of the company.

IlliquidRiskr0.7 CreditRating

In this relation (Capital, Revenue, WorkForce, CompAge, LegalType) is the fuzzykey with strength 0.7. FFD1 contains a part of the fuzzy key as its left-hand side,that is, FinanBack is partially fuzzy functionally dependent on the fuzzy key. Alsoin FFD2, CompStruct is partially fuzzy functionally dependent on the fuzzy key.

900 BAHAR AND YAZICI

Page 17: [2004] Normalization and Lossless Join

So the relation is not in fuzzy 2NF; it should be decomposed. The decompositionis as follows:

R1 � (Capital, Revenue, FinanBack)

where Capital, Revenue is the fuzzy key with strength 0.8, and its ffd is

$Capital, Revenue%r0.8 FinanBack

R2 � (Workforce, CompAge, LegalType, CompStruct)

where Workforce, CompAge, LegalType is the fuzzy key with strength 0.7, and itsffd is

$WorkForce, Compage, LegalType%r0.7 CompStruct

We must also make sure to keep a relation with the remaining attributes, removingFinanBack and CompStruct from the relation. So, we have the below third relationwith (Capital, Revenue, Workforce, CompAge, LegalType) being the fuzzy keywith strength 0.7. Then, the last relation is

R3 � (Capital, Revenue, Workforce, CompAge, LegalType,

IlliquidRisk, CreditRating)

and the corresponding ffds are

$Capital, Revenue, Workforce, CompAge, LegalType%r0.9 IlliquidRisk

IlliquidRiskr0.7 CreditRating

At this point, all three relations, R1, R2, and R3 are in fuzzy 2NF.

3.6. Fuzzy Third Normal Form

The normalization process takes a relation schema through a series of tests tocertify whether it satisfies a certain normal form. The process proceeds in a top-down fashion. In a database design satisfying the fuzzy 3NF, insertion, deletion,and update anomalies will be minimum.

Definition. Let F be the set of ffds for R, and K be the fuzzy key of R withstrength q. R is called to be in fuzzy 3NF if and only if R is in fuzzy 2NF and forany Xra A in F where A is not in X, either X contains the fuzzy key or A is fuzzyprime.

3.6.1. Fuzzy Third Normal Form Control

The definition of fuzzy 3NF can directly be used to control whether a givenrelation is in fuzzy 3NF. All of the ffds should be checked against the conditions:If the left-hand side attributes contain all the attributes of the right-hand side, thatffd does not violate fuzzy 3NF. Similarly if the left-hand side contains any of the

NORMALIZATION OF FRDBs 901

Page 18: [2004] Normalization and Lossless Join

fuzzy keys of the relation, fuzzy 3NF is not violated. And finally, if the right-handside attributes of the ffd are all fuzzy prime attributes, fuzzy 3NF is also not vio-lated. These are composed together in the algorithm below.

Algorithm. F3NF Control Algorithm. Let K be the fuzzy key set of relation R.

(1) For every ffd X ra Y in the relation,• If X � Y, fuzzy 3NF is not violated; otherwise,• If X � Ki, for any Ki � K, fuzzy 3NF is not violated; otherwise,• Let P be the set of fuzzy prime attributes of R. If Y � P, fuzzy 3NF is also not

violated.(2) If none of the above conditions are satisfied for at least one of the ffds in the relation,

the relation is not in fuzzy 3NF.

Example 9. For a symbolic example, let R � (A, B, C, D) and the ffds beAB r0.9 C, AC r0.8 D, and C r0.6 E. The first ffd has the fuzzy key as itsleft-hand side not violating the fuzzy 3NF. But the second and third ffds, ACr0.8 Dand C r0.6 E, violate the fuzzy 3NF definition; left-hand sides are not a part offuzzy key, AB, and D and E are not fuzzy prime. Then R is not in fuzzy 3NF.

3.6.2. Decomposition into Fuzzy Third Normal Form

The normalization process based on ffds uses a number of decompositionswhile normalizing the relations. But normal forms do not always guarantee a gooddatabase design. Generally it is not sufficient to only check that each relation schemain the database is in one of the fuzzy normal forms, fuzzy 3NF, or in fuzzy BoyceCode Normal Form (BCNF). The normalization process should also confirm theexistence of two additional and desirable properties, dependency preservation prop-erty and lossless join property. The decomposition algorithms having these men-tioned properties will be given in the following sections.

3.6.2.1. Minimal Cover. In the next two sections, two algorithms are givenboth for the dependency preserving and lossless join decompositions. But for thedecompositions to possess the two desired properties, the initial ffd set should be aminimal cover and it should be free of partial ffds. A minimal cover of a set ofdependencies F is a set of dependencies that is equivalent to F with no redundan-cies. A set of ffds F is minimal if the following conditions hold: (1) every depen-dency in F has a single attribute for its right-hand side, (2) we cannot replace anyX ru A with Y ra A where Y is a proper subset of X and a � u and still have aset of ffds equivalent to F, and (3) we cannot remove any dependency from F andstill have a set of ffds equivalent to F.

Partial ffd free means that there is no partial ffd in the set of ffds. The algo-rithm below finds the minimal cover of a given ffd set and makes it partial ffd free.

Algorithm. Minimal Cover Algorithm: Let F be the set of ffds, and assign F toG, G :� F.

902 BAHAR AND YAZICI

Page 19: [2004] Normalization and Lossless Join

(1) Replace each ffd X rqi$A1, A2, . . . , An % in G by n ffds X rqi

A1, X rqiA2, . . . ,

XrqiAn.

(2) For each ffd X rqiAk in G

For each attribute B � X

If ~~G � $X rqiAk %! � ~~X � $B%! ra Ak !! where a � qi is equivalent to G

Then replace X rqiAk with ~X � $B%! ra Ak in G.

(3) For each remaining ffd X rqiAk in G

If ~G � $X rqiAk %! is equivalent to G, then remove X rqi

Ak from G.

3.6.2.2. Dependency Preserving Decomposition into Fuzzy Third NormalForm. In fuzzy databases, it is important to preserve the dependencies whiledecomposing the relations like their classical counterparts, because each depen-dency in the fuzzy database represents a constraint in the database. If one of thedependencies is not represented in some individual relation Ri , we have to jointwo or more relations in the decomposition and then proceed, and that is ineffi-cient and impractical. The dependency preservation property ensures that each ffdis represented in some individual relation resulting after decomposition.

Now, we give the algorithm that creates a dependency-preserving decompo-sition of a relation R based on a set of ffds, F, such that each relation in the decom-position is in fuzzy 3NF.

Algorithm. Dependency Preserving Decomposition into Fuzzy 3NF Algorithm.

• Find the minimal cover G for F, and make it partial ffd free by using the Min coverAlgorithm above.

• Place any attributes that have not been included in any of the ffds of G in a separaterelation schema, and eliminate them from R.

• If any of the ffds in G involves all the attributes of R, then the decomposition is R.• Else, for each left-hand side X of ffds in G, create a new relation schema in D with

attributes $X � $A1% � . . . � $Ak %% where X rq1A1, X rq2

A2, . . . , X rqkAk are

the ffds in G, and X is the fuzzy key of this new relation with strength qimin.

Example 10. Let R � ~A, B, C, D, E ! and the ffds be CD r0.7 A, CD r0.7 B,AD r0.5 E, CD r0.7 E, A r0.8 B, and B r0.6 E. Hence CD is the fuzzy keyof the relation with strength 0.7.

First of all, the minimal cover algorithm is applied. G is initialized to the setof ffds, F, that is, G :� F. All the ffds are in the form of X rqi

Ai , meaning thatevery ffd has a single attribute on its right-hand side. In the third step, for the ffdAD r0.5 E, for the attribute D � $A, D%, ~G � $AD r0.5 E %! � ~$A r0.6 E %!is equivalent to G because 0.6 � 0.5. In this step, A r0.6 E is obtained from thetwo ffds A r0.8 B, and B r0.6 E using the transition property. So AD r0.5 E isreplaced with A r0.6 E in G. In the last step, the ffd A r0.6 E, obtained in theprevious step, is removed just because it can be obtained from the last two ffdsA r0.8 B, and B r0.6 E. Then, the minimal cover G is

NORMALIZATION OF FRDBs 903

Page 20: [2004] Normalization and Lossless Join

CDr0.7 A, CDr0.7 B, CDr0.7 E, Ar0.8 B, and Br0.6 E.

For the second step of the dependency-preserving decomposition algorithm,for each left-hand side of the ffds, where CD is the fuzzy key of the relation withstrength 0.7, a relation schema is created with attributes A, B, C, D, and E whoseffds are CD r0.7 A, CD r0.7 B, and CD r0.7 E, with CD as the fuzzy key withstrength 0.7. Then for the remaining ffds A r0.8 B, and B r0.6 E, two separaterelation schemas, one with attributes A and B, the other with attributes B and E, arecreated. At the end, after the dependency-preserving decomposition three relationschemas are obtained. The first one is R1 � (A, B, C, D, E ) with fuzzy functionaldependencies CD r0.7 A, CD r0.7 B, and CD r0.7 E, the second one is R2 �(A, B) with A r0.8 B, and the third one is R3 � (B, E ) with B r0.6 E.

3.6.2.3. Lossless Join Decomposition into Fuzzy Third Normal Form.Another desired property of a decomposition is the lossless join property. If adecomposition does not have the lossless join property, then we may get spurioustuples after joining those relations in that decomposition. These spurious tuplesrepresent erroneous information. Therefore, this property is critical and must cer-tainly be achieved. Lossless join property guarantees that spurious tuple genera-tion problem does not occur with respect to the relation schemas created afterdecomposition. The algorithm below provides a lossless join decomposition intofuzzy 3NF.

Algorithm. Lossless Join Decomposition into Fuzzy 3NF Algorithm.

• Find the minimal cover G for F, and make it partial ffd free.• Place any attributes that have not been included in any of the ffds of G in a separate

relation schema, and eliminate them from R.• If any of the ffds in G involves all the attributes of R, then the decomposition is R.• Else, for each left-hand side X of ffds in G, create a new relation schema in D with

attributes $X � $A1% � . . . � $Ak %% where X rq1A1, X rq2

A2, . . . , X rqkAk are

the ffds in G, and X is the fuzzy key of this new relation with strength qimin.

• If none of the relation schemas contains the fuzzy key of R, create one more relationschema that contains attributes that form the fuzzy key of R.

The testing algorithms for these two properties, dependency preserving and loss-less join properties, are presented in the following sections after fuzzy BCNF.

Example 11. The lossless join decomposition algorithm brings an additional stepinto the dependency-preserving decomposition at the end, by creating a new rela-tion schema for the fuzzy key of the relation. If we consider the relation in Exam-ple 10 again, in order to get a lossless join decomposition, we must go through allthe steps of the dependency-preserving decomposition again and at the end wemust create a new relation for the fuzzy key, CD, if it is not contained in any of thedecomposed relations. But, in our case, the fuzzy key CD is already contained inone of the decomposed relations, so there is no need to create a new relation.

904 BAHAR AND YAZICI

Page 21: [2004] Normalization and Lossless Join

Then, after a lossless join decomposition into fuzzy 3NF, we have three relations,R1 � (A, B, C, D, E ), R2 � (A, B), and R3 � (B, E ), as in Example 10.

3.6.2.4. An Example Application: Leasing Risk Assessment. The LeasingRisk Assessment relation analyzed above can be further analyzed for fuzzy 3NF.When the conditions for the fuzzy 3NF are considered, R1 and R2 in the decom-posed relation R do not violate the fuzzy 3NF, because in each relation there isonly one ffd, and their left-hand sides are the fuzzy keys of the correspondingrelations. But in the third relation, the ffds are

$Capital, Revenue, Workforce, CompAge, LegalType%r0.9 IlliquidRisk

IlliquidRiskr0.7 CreditRating

According to fuzzy 3NF control algorithm, the first ffd satisfies the second condi-tion so it does not violate the fuzzy 3NF, but in the second ffd none of the condi-tions are met, the left-hand side does not contain the right-hand side, and also itdoes not contain the key, and lastly CreditRating is not a fuzzy prime attribute.Consequently, the third relation is not in fuzzy 3NF, and it must be decomposed.Applying the Dependency-Preserving Decomposition into fuzzy 3NF algorithm,we get the decomposed relations as follows:

R4 � (Capital, Revenue, Workforce, CompAge, LegalType, IlliquidRisk)

where Capital, Revenue, Workforce, CompAge, LegalType is the fuzzy key withstrength 0.9, and its ffd is $Capital, Revenue, Workforce, CompAge, LegalType%r0.9

IlliquidRisk

R5 � (IlliquidRisk, CreditRating)

where IlliquidRisk is the fuzzy key with strength 0.7, and its ffd is

IlliquidRiskr0.7 CreditRating

Lossless Join Decomposition into fuzzy 3NF Algorithm has only one additionalstep with respect to the dependency-preserving decomposition algorithm: If noneof the relation schemas contains the fuzzy key of R3, create one more relationschema that contains attributes that form the fuzzy key of R3. But in our case,relation R4 has the fuzzy key of R3; hence the decomposition is also a lossless joindecomposition.

3.7. Fuzzy Boyce Codd Normal Form

Like its classical counterpart, fuzzy boyce codd normal form (fuzzy BCNF)is a stricter form of fuzzy 3NF. Fuzzy BCNF ensures that there is no redundancythat can be detected using ffd information alone. It is the most desirable normalform from the point of view of redundancy. The formal definition of the fuzzyBCNF can be given as follows.

NORMALIZATION OF FRDBs 905

Page 22: [2004] Normalization and Lossless Join

Definition. Let F be the set of ffds for schema R, and K be the fuzzy key of Rwith strength q. R is called to be in Fuzzy BCNF if and only if R is in fuzzy 3NF andfor any Xrp A in F, either A is in X or X is a fuzzy superkey of R, that is X � K.

To check whether a given relation is in fuzzy BCNF, all of the ffds in therelation should be checked against the specified two conditions. If the left-handside of the ffd contains all the attributes of the right-hand side or any of the fuzzykeys of the relation, that ffd does not violate the fuzzy BCNF. The algorithm forthe decomposition into fuzzy BCNF is given below. The algorithm ensures thatthe decomposition is a lossless join decomposition.

Algorithm. Decomposition into Fuzzy BCNF Algorithm: Let the ffd that vio-lates fuzzy BCNF be X rp A, where A, X � R and A is the single attribute.Decompose R into two relation schemas R � A and XA.

• Recursively apply the previous step for all the ffds that violate the fuzzy BCNF, untilthere is no ffd in the relation violating fuzzy BCNF.

The ffds being checked against fuzzy BCNF are already in fuzzy 3NF and theirright-hand sides consist of single attributes, because of the fuzzy 3NF decomposi-tion algorithm.

Example 12. Consider a relation schema R � (A, B, C, D, E, F, G) with ffdsCE r0.7 A, BD r0.6 E, and C r0.9 B, and A is the fuzzy key of the relationwith strength 0.8, that is, Ar0.8 BCDEFG. The relation schema is in fuzzy 2NFbecause there is no partial dependence (the fuzzy key of the relation, A, is alreadya single attribute). Now, we have to check if the relation is in fuzzy 3NF. To be infuzzy 3NF, either the left-hand side of the ffds should contain the fuzzy key, A, orthe right-hand side is fuzzy prime, that is, a part of the fuzzy key. In our example,the second ffd violates this constraint, so the relation is not in fuzzy 3NF andconsequently not in fuzzy BCNF. According to our algorithm, we decompose therelation into two; one with attributes $B, D, E % and ffd BD r0.6 E with BD asthe fuzzy key with strength 0.6, and the other with the attributes $A, B, C, D, F, G%and ffd Cr0.9 B with still A as the fuzzy key with strength 0.8. In this decompo-sition, the second relation schema is still not in fuzzy BCNF because of the ffdC r0.9 B. So we decompose it again into two new relations, the first one withattributes $A, C, D, F, G% and A being the fuzzy key with strength 0.8, and thesecond one with the attributes $B, C% and ffd C r0.9 B with C as the fuzzy keywith strength 0.9. Thus each of the schemas BDE, BC, and ACDFG is in fuzzyBCNF.

In the “Leasing Risk Assessment” example, all the decomposed relations arein fuzzy BCNF.

3.8. Dependency Preservation Property Testing in Decompositions

While discussing the fuzzy 3NF, two algorithms are given for the decompo-sition into fuzzy 3NF, one achieving the dependency preservation property, and

906 BAHAR AND YAZICI

Page 23: [2004] Normalization and Lossless Join

the other also having the lossless join property. Also the algorithm for normaliza-tion into fuzzy BCNF ensures the lossless join property. The dependency preser-vation property of the decompositions in the fuzzy relational data model is studiedin Ref. 23 widely. In this section, an algorithm is presented to test the dependencypreservation property of decompositions.

Algorithm. Dependency Preservation Testing Algorithm: For every ffd, Xra Y,where X � X1 X2 . . . Xm,

(1) Construct a transitive closure list, ZList, initially for all the attributes of the left-handside of the ffd, X, with maximum strengths.

ZList � $~X1,1!, ~X2,1! , . . . , ~Xm ,1!%

(2) While (true)

i. ZList2 R Zlist.ii. Reset domain.

iii. For each decomposed relation Ri ~i � 1 to k! ,

• Reset domain.• For each element of ZList2,

If the attribute of this element is in Ui, where Ui is the attribute set of Ri, addthe attribute to the domain.

• Find the transitive closure of the domain, ZListi.• For each element of ZListi ,

If the attribute of this element is in Ui add the element to TListi.• Combine TListi into ZList2 using fuzzy union operation.

iv. If ZList � ZList2 break.v Else ZListR ZList2.

(3) If all the attributes in Y occur in ZList with the strength a or greater, then dependencypreserving property is not violated, continue with the other ffd.

(4) Else not dependency preserving, break.

Example 13. Let the attribute set for a relation R be $A, B, C%, the decomposedrelations be R1 � (A, B) and R2 � (B, C), and the ffds be Ar0.9 B and Br0.7 C.For the first ffd, A r0.9 B, transitive closure of “A” is ZList � $(A, 1)% initially.

ZList2 � $~A, 1!%

R1 ] domain � $A%

ZList1 � $~A, 1!, ~B, 0.9!, ~C, 0.7!%

TList1 � $~A, 1!, ~B, 0.9!%

ZList2 � $~A, 1!, ~B, 0.9!%

R2 ] domain � $B%

ZList2 � $~B, 1!, ~C, 0.7!%

TList2 � $~B, 1!, ~C, 0.7!%

ZList2 � $~A, 1!, ~B, 0.9!, ~C, 0.7!%

NORMALIZATION OF FRDBs 907

Page 24: [2004] Normalization and Lossless Join

Because ZList � ZList2,

ZListR $~A, 1!, ~B, 0.9!, ~C, 0.7!%

In the second pass, ZList � ZList2, exiting the loop, we see that the right-hand-sideattribute B occurs in ZList with strength 0.9, so the dependency preservation prop-erty is not violated and we continue with the second ffd, B r0.7 C. Similarly, atthe end we find ZList � $~B, 1!, ~C, 0.7!%, and because attribute “C” occurs inZList with strength 0.7, the dependency preservation property is not violated. Hencethe decomposition is dependency preserving.

3.9. Lossless Join Property Testing in Decompositions

Chen, Kerre, and Vandenbulcke impose a restriction on the extended alge-braic operations in their study.22 In accordance with the design issues and to achievea complete information reconstruction, they restricted the eight algebraic opera-tions, namely product, union, intersection, natural join, projection, selection, minus,and division, so that they are performed for base relations only on identical ele-ments or tuples, not on close ones. That means that whenever tuple merging is ofconcern, it is referred to identical elements. Raju and Majumdar6 restrict the fuzzyresemblance relation and named the class of ffds where the fuzzy resemblancerelation is restricted as restricted ffd. With these choice of restrictions, both Chenet al.22 and Raju and Majumdar6 use a classic algorithm to test lossless join decom-position of fuzzy relation with ffds.

In this article, we also utilize the classic algorithm to test whether a de-composition has lossless join property. The logic in using the classical testingalgorithm in the similarity-based fuzzy relational database model is as follows.The table created during the application of the algorithm is used only to deter-mine whether there is a joining attribute between the decomposed relations.Fuzziness is taken into the consideration after this point. If there is a joiningattribute, the decision whether they can be joined or not is given according totheir similarity levels and a predefined threshold. The tuples should also satisfyall the ffds of the relation; that is, for every pair of tuples, for each ffd X rq Y,C~Y @t1, t2 # !� min~q, C~X @t1, t2 # !!.

Algorithm. Lossless Join Testing Algorithm: Let the relation schema be Rwith the attributes A1, A2, . . . , An, F be the ffds, and r � $R1, R2, . . . , Rk % be thedecomposition.

(1) Create an initial table T with one row i, for each relation Ri in the decomposition andone column j for each attribute Aj in the relation being decomposed, R.

(2) Put bij in every cell of the table.(3) For each row i and column j,

If Aj is in attribute domain, Ui, of Ri, then set Tij � aj

(4) Repeat until there are no changes in T.For each ffd X ra Y in F,

For all rows in T, look for those rows which have the same symbols in all columnscorresponding to attributes in X,

908 BAHAR AND YAZICI

Page 25: [2004] Normalization and Lossless Join

For any two rows make the symbols in all columns for the attributes in Y bethe same as follows: if any of the symbols is an “a” symbol set the other tothat same “a” symbol.

(5) At the end, if a row is entirely of “a” symbols then the decomposition has lossless joinproperty. Otherwise, it is not lossless join decomposition.

Example 14. Let the relation schema be R � (A, B, C, D, E, F) and ffds beA r0.6 B, C r0.5 DE, and AC r0.8 F. Here AC is the fuzzy key of the relationwith strength 0.5. Suppose we decompose R into two relations R1 � (B, E ) andR2 � (A, C, D, E, F), and then test for the lossless join. The initial table T hasi � 2 rows for relations R1 and R2, and j � 6 columns for the attributes A, B, C, D,E, and F. For the second step, we initialize each cell with bij . The initial table canbe seen in Table IV.

For the first row, T12 and T15 are set to a2 and a5, respectively, because rela-tion R1 contains the attributes A2 � B and A5 � E. Similarly for the second row,entries T21, T23, T24, T25, and T26 are set to a1, a3, a4, a5, and a6, respectively,because R2 contains the attributes A, C, D, E, and F as in Table V.

For the first ffd A r0.6 B, R1 and R2 do not have the same symbols in thefirst column, the column for the attribute A, so there is no change in B column.Considering the second ffd, C r0.5 DE, again R1 and R2 do not have the samesymbols in the column for C, and there is no change in the table. The situation isthe same for the last ffd, ACr0.8 F. At the end, because there is no row consist-ing of entirely “a” symbols, the decomposition is not lossless join decomposition.

Example 15. Now we give a lossless join decomposition example. Let the rela-tion schema be R � (A, B, C, D, E, F, G) and ffds be ABCr0.7 D, ABCr0.8 E,DE r0.7 F, and F r0.6 G. Here ABC is the fuzzy key of the relation withstrength 0.7. Suppose we decompose R into three relations R1 � (A, B, C, D, E ),R2 � (D, E, F), and R3 � (F, G), and then test for the lossless join. The initialtable T has i � 3 rows for relations R1, R2, and R3, and j � 7 columns for the

Table IV. Initial table for relation R � ~A, B,C, D, E, F!.

T A B C D E F

R1 b11 b12 b13 b14 b15 b16

R2 b21 b22 b23 b24 b25 b26

Table V. Table after applying the third step oflossless join testing algorithm to R.

T A B C D E F

R1 b11 a2 b13 b14 a5 b16

R2 a1 b22 a3 a4 a5 a6

NORMALIZATION OF FRDBs 909

Page 26: [2004] Normalization and Lossless Join

attributes A, B, C, D, E, F, and G. For the second step, we initialize each entry withbij , Table VI.

For the first row, T11, T12, T13, T14, and T15 are set to a1, a2, a3, a4, and a5,respectively, because relation R1 contains the attributes A1 � A, A2 � B, A3 � C,A4 � D, and A5 � E. Similarly for the second row, entries T24, T25, and T26 are setto a4, a5, and a6, respectively, because R2 contains the attributes D, E, and F. Andfinally, entries T36 and T37 are set to a6 and a7, respectively, because R3 containsthe attributes F and G, and the table becomes as Table VII.

For the first and second ffds ABC r0.7 D, and ABC r0.8 E, R1, R2, and R3do not have the same symbols in the columns for the attributes A, B, and C, sothere is no change in the D column. Considering the third ffd, DEr0.7 F, R1 andR2 have the same symbols in the columns for D and E, so the column of attribute Ffor relation R1, b16, will be changed into a6 in the table. Then we get Table VIII.

Finally, for the last ffd F r0.6 G, R1, R2, and R3 have the same symbols inthe column for F, so the column of attribute G for relation R1 and R2 will bechanged into a7 in the table, and the table becomes as the one in Table IX.

At the end, because there is a row consisting of entirely “a” symbols, that is,the first row, the decomposition is lossless join decomposition.

3.10. An Example Application: Fraud Detection

An increasing number of transactions are carried out remotely and electroni-cally in today’s financial world. Thus, with the complexity of the system, the oppor-tunities for criminals to conduct fraudulent transactions rise. Credit cards are oneof the areas where fraudulent behavior is extremely important for financial insti-tutions. Fraudulent behavior can arise through different ways. In one of these ways,the criminals are individuals; they steal credit cards and then use them towardpurchases. In another case, criminal groups steal new credit cards and duplicate

Table VI. Initial table for relation R � ~A, B, C,D, E, F, G!.

T A B C D E F G

R1 b11 b12 b13 b14 b15 b16 b17

R2 b21 b22 b23 b24 b25 b26 b27

R3 b31 b32 b33 b34 b35 b36 b37

Table VII. Result of the third step of lossless jointesting algorithm to R.

T A B C D E F G

R1 a1 a2 a3 a4 a5 b16 b17

R2 b21 b22 b23 a4 a5 A6 b27

R3 b31 b32 b33 b34 b35 A6 A7

910 BAHAR AND YAZICI

Page 27: [2004] Normalization and Lossless Join

them. On the other hand, there is a customer-induced fraud in which customersclaim that their credit card was stolen after making some expensive purchases.Most of the credit card companies use some sophisticated systems to detect fraud-ulent behavior, because various opportunities for this still exist although most creditcard purchases are electronically verified before the actual transaction. These sys-tems have to work with very little significant data; they know only the past cus-tomer history and the current transaction information. On the other hand, theyshould not too easily decline nonfraudulent transactions so as not to make thecustomers dissatisfied.

At this point, the companies are unwilling to disclose system details or eventhe fact that they use fuzzy logic fraud detection systems. Our case will be on afinancial service provider. The company offers its customers both banking andinsurance services, and the system is used for the detection of insurance fraud.Each insurance claim in the field of home insurance is evaluated to assess thefraudulent behavior likelihood. The company wanted to implement a fraud detec-tion system that looks at multiple factors in every insurance claim and selects onlythose that have a certain degree of likelihood of fraud.

All information about the customers is hold in a database. By using the sys-tem, the insurance claim is evaluated, and if the likelihood of fraud assessed islower than a certain predefined threshold, the claim is immediately paid out to thecustomer. If the result is higher than the threshold, then the claim is passed on to aclaims auditor with the reason result. After his manual review, final decisions onfurther steps are made.

We have the following attributes for the system: Number of claims in the last12 months, amount of the current claim, time with insurance, average balance onall banking accounts over the last 12 months, number of overdrafts over the last 12months, annual income of the customer, recent changes in status, insurance historyevaluation, banking history evaluation, personal evaluation, fraud likelihood, and

Table VIII. Result of the fourth step of lossless jointesting algorithm for the first three FFDs of R.

T A B C D E F G

R1 a1 a2 a3 a4 a5 a6 b17

R2 b21 b22 b23 a4 a5 a6 b27

R3 b31 b32 b33 b34 b35 a6 a7

Table IX. Table for R � ~A, B, C, D, E, F, G! atthe end of lossless join testing algorithm.

T A B C D E F G

R1 a1 A2 a3 a4 a5 a6 a7

R2 b21 b22 b23 a4 a5 a6 a7

R3 b31 b32 b33 b34 b35 a6 a7

NORMALIZATION OF FRDBs 911

Page 28: [2004] Normalization and Lossless Join

fraud reason explanation. The first three attributes give information about the insur-ance contract and the claim itself, the next two attributes describe the banking back-ground of the customer, and the sixth and seventh attributes provide the personalbackground. Then our relation schema and the fuzzy functional dependencies willbe as follows:

R: ~NumClaim, Amount, CustSince, AvgAmnt, NumOvr, Income, StatChng,HistIns, HistBank, Personal, Fraud, Reason!

FFD1: Number of claims in the last 12 months, amount of current claim, andtime with insurance mostly determines the insurance history evaluation.

$NumClaim, Amount, CustSince%r0.8 HistIns

FFD2: Average balance on all banking accounts over the last 12 months andnumber of overdrafts over the last 12 months generally determines bankinghistory evaluation.

$AvgAmnt, NumOvr%r0.7 HistBank

FFD3: Annual income of the customer and recent changes in status more orless determines personal evaluation.

$Income, StatChng%r0.6 Personal

FFD4: Insurance history evaluation, banking history evaluation, and per-sonal evaluation mostly determines fraud reason explanation.

$HistIns, HistBank, Personal %r0.8 Reason

FFD5: Fraud reason explanation more or less determines fraud likelihood.

Reasonr0.6 Fraud

The attributes can be briefly explained as follows. NumClaim gives an indi-cation of how often the customer has used the insurance in the past year. Amountexpresses how significant the current claim is. CustSince takes into account howlong the insurance contract has been in existence. HistIns indicates how much thecustomer has exercised their insurance contract in the past and present. AvgAmntis the average total balance on all banking accounts of the customer. NumOvr isthe number of overdrafts on checking accounts. HistBank evaluates the bankinghistory of the customer and its relevance to his insurance claim. Personal assessesthe customer’s basic situation, detects possible motives within the customer’s lifestyle that could motivate fraudulent behavior. StatChng indicates whether a fun-damental change in the customer’s life has occurred over the past four months.

Normalization process begins with the fuzzy 1NF, but because there are notuples at the beginning, we continue with the fuzzy 2NF. Analyzing the ffds, thefuzzy key is $NumClaim, Amount, CustSince, AvgAmnt, NumOvr, Income, Stat-Chng% with a degree of 0.6 because the transitive closure of this attribute set

912 BAHAR AND YAZICI

Page 29: [2004] Normalization and Lossless Join

contains all the attributes of the relation. In this case, HistIns, HistBank, Personal,Fraud, and Reason are fuzzy nonprime attributes. For the relation to be in fuzzy2NF, none of these fuzzy nonprime attributes is partially fuzzy functionally depen-dent on the fuzzy key. But in our case, this restriction is violated, so the relation isnot in fuzzy 2NF, and it should be normalized into a number of smaller relationsthat are in fuzzy 2NF. Using the decomposition algorithm 3.4.2.1, R is decom-posed into four new relations R1 through R4.

R1: (NumClaim, Amount, CustSince, HistIns) with the fuzzy functional dependency

$NumClaim, Amount, CustSince%r0.8 HistIns

R2: (AvgAmnt, NumOvr, HistBank) with the fuzzy functional dependency

$NumOvr, AvgAmnt%r0.7 HistBank

R3: (Income, StatChng, Personal ) with the fuzzy functional dependency

$Income, StatChng%r0.6 Personal

and a relation with the remaining attributes, after removing the fuzzy nonprimeattributes partially fuzzy functionally dependent on the fuzzy key of the originalrelation,

R4: ~NumClaim, Amount, CustSince, AvgAmnt, NumOver, Income, StatChng, Fraud,Reason! with the fuzzy functional dependency

$NumClaim, Amount, CustSince, AvgAmnt, NumOver, Income, StatChng%r0.6 Reason

Reasonr0.6 Fraud

After achieving the fuzzy 2NF, conditions for the fuzzy 3NF should be tested.For a relation to be in fuzzy 3NF, it should already be in fuzzy 2NF, and addition-ally for each ffd in the relation either the left-hand side contains the fuzzy key ofthe relation or the right-hand side consist of fuzzy prime attributes. For the firstthree of the relations, the left-hand sides of the ffds contain the respective fuzzykeys of the relations. But in the fourth relation, in the second ffd, neither the left-hand side contains the fuzzy key, that is, $NumClaim, Amount, CustSince, AvgAmnt,NumOver, Income, StatChng%, nor the right-hand side attribute “Fraud” is fuzzyprime. So the last relation should be decomposed into fuzzy 3NF. To be able tomake a lossless join decomposition into fuzzy 3NF, initially minimal cover of theffds of R4 should be found. After applying the minimal cover algorithm, we findthe minimal cover for R4 as shown below:

NumClaim, Amount, CustSince, AvgAmnt, NumOver, Income, StatChngr0.6 Reason,

Reasonr0.6 Fraud

Then by using the lossless join decomposition into fuzzy 3NF algorithm, R4 isdecomposed into two new relations, R5 and R6.

NORMALIZATION OF FRDBs 913

Page 30: [2004] Normalization and Lossless Join

R5: ~NumClaim, Amount, CustSince, AvgAmnt, NumOver, Income, StatChng, Reason!with the fuzzy functional dependency

$NumClaim, Amount, CustSince, AvgAmnt, NumOver, Income, StatChng%r0.6 Reason

R6: (Reason, Fraud ) with the fuzzy functional dependency

Reasonr0.6 Fraud

At this point, all the relations are also in fuzzy BCNF. Applying the DependencyPreservation Testing Algorithm, we see that the decomposition has the property ofdependency preservation. We can also check whether the decomposition of R intoR1, R2, R3, R5, and R6 has the lossless join property by using the lossless joinproperty testing algorithm. Table X has five rows, one for each decomposed rela-tions, and 12 columns, one for each attribute. After initializing the entries withrespect to ffds the decomposed relations are shown in Table X. Then for eachfuzzy functional dependency, the table should be processed. The ffd to be pro-cessed are

“$NumClaim, Amount, CustSince%r0.8 HistIns”,

“$AvgAmnt, NumOvr%r0.7 HistBank”,

“$Income, StatChng%r0.6 Personal”,

“$HistIns, HistBank, Personal %r0.8 Reason”,

“Reasonr0.6 Fraud”

The result of this step is shown in Table XI. Because there is a row, that is, thefourth row, made up of entirely “a” symbols, therefore, the decomposition satis-fies the lossless join property.

4. CONCLUSION

Like the classical databases, the fuzzy databases not properly designed sufferfrom the problems of data redundancy and update anomalies. To provide a goodfuzzy relational database design, the concept of ffd is used to define the fuzzynormal forms and dependency-preserving and lossless join properties.

In this article, we begin with the first step of the normalization process anddefine the Fuzzy 1NF. Then the concept of fuzzy key is introduced. It constitutes abase for the remaining fuzzy normal forms, Fuzzy 2NF, Fuzzy 3NF, and FuzzyBCNF. To state the condition for fuzzy normal forms, the definitions of fuzzyprime and fuzzy nonprime attributes are introduced. We also discuss the two desir-able properties of decompositions, namely the dependency preservation propertyand the lossless join property, which are both used by the design algorithms toachieve desirable decompositions. Normal forms are insufficient on their own ascriteria for a good database design. The relations must collectively satisfy thesetwo additional properties to qualify as a good design. The situation is the samewhen we deal with fuzzy data and fuzzy normal forms. We illustrate how these

914 BAHAR AND YAZICI

Page 31: [2004] Normalization and Lossless Join

Tab

leX

.In

itia

ltab

lefo

rre

lati

onR

�(N

umC

laim

,Am

ount

,Cus

tSin

ce,A

vgA

mnt

,Num

Ovr

,Inc

ome,

Stat

Chn

g,H

istI

ns,H

istB

ank,

Per

sona

l,F

raud

,R

easo

n)af

ter

sett

ing

the

entr

ies

wit

hre

spec

tto

deco

mpo

sed

rela

tion

s.

RN

umC

laim

Am

ount

Cus

tSin

ceA

vgA

mnt

Num

Ovr

Inco

me

Stat

Chn

gH

istI

nsH

istB

ank

Per

sona

lF

raud

Rea

son

R1

a 1a 2

a 3b 1

4b 1

5b 1

6b 1

7a 8

b 19

b 110

b 111

b 112

R2

b 21

b 22

b 23

a 4a 5

b 26

b 27

b 28

ab 2

10b 2

11b 2

12

R3

b 31

b 32

b 33

b 34

b 35

a 6a 7

b 38

b 39

a 10

b 311

b 312

R5

a 1a 2

a 3a 4

a 5a 6

a 7b 4

8b 4

9b 4

10b 4

11a 1

2

R6

b 51

b 52

b 53

b 54

b 55

b 56

b 57

b 58

b 59

b 510

a 11

a 12

Tab

leX

I.Ta

ble

for

rela

tion

R�

(Num

Cla

im,A

mou

nt,C

ustS

ince

,Avg

Am

nt,N

umO

vr,I

ncom

e,St

atC

hng,

His

tIns

,His

tBan

k,P

erso

nal,

Fra

ud,

Rea

son)

atth

een

dof

loss

less

join

test

ing

algo

rith

m.

RN

umC

laim

Am

ount

Cus

tSin

ceA

vgA

mnt

Num

OvR

Inco

me

Stat

Chn

gH

istI

nsH

istB

ank

Per

sona

lF

raud

Rea

son

R1

a 1a 2

a 3b 1

4b 1

5b 1

6b 1

7a 8

b 19

b 110

b 111

b 112

R2

b 21

b 22

b 23

a 4a 5

b 26

b 27

b 28

a 9b 2

10b 2

11b 2

12

R3

b 31

b 32

b 33

b 34

b 35

a 6a 7

b 38

b 39

a 10

b 311

b 312

R5

a 1a 2

a 3a 4

a 5a 6

a 7a 8

a 9a 1

0a 1

1a 1

2

R6

b 51

b 52

b 53

b 54

b 55

b 56

b 57

b 58

b 59

b 510

a 11

a 12

NORMALIZATION OF FRDBs 915

Page 32: [2004] Normalization and Lossless Join

fuzzy normal forms can be used to decompose an unnormalized relation into a setof normalized relations by examples.

We have developed an implemented system (using Borland C�� 4.0), whichis carried out within the framework. Implementation consists of two main parts.The first part defines the attributes and their properties and provides an interfaceto accept tuples and check their conformance. The second part of the implementa-tion consists of normalization procedures, controlling the level of the normal formsand decomposing the relation into various normal forms with dependency preser-vation and the lossless join properties.

Further study involving the fuzzy multivalued dependencies, fuzzy join depen-dencies, fuzzy inclusion dependencies, and related normal forms has been ongoing.

References

1. Codd E. A relational model for large shared data banks. Commun ACM 1970;13:377–387.2. Chen G, Kerre EE, Vandenbulcke J. Normalization based on ffd in a fuzzy relational data

model. Inform Syst 1996;21:299–310.3. Imelinski T, Lipski W. Incomplete information in relational databases. J ACM 1984;31:701–

791.4. Medina J, Pons O, Vila M. GEFRED: A generalized model to implement fuzzy relational

databases. Inform Sci 1994;47:234–254.5. Petry FE. Fuzzy databases: Principles and applications. Boston: Kluwer Academic Pub-

lishers; 1996.6. Raju KVSVN, Majumdar AK. Fuzzy functional dependencies and lossless join decompo-

sition of fuzzy relational database systems. ACM Trans Database Syst 1988;13:129–166.7. Umano M, Freedom O. A fuzzy database system. In: E. Sanchez, M. M. Gupta, editors.

Fuzzy Information and Decision Processes. Amsterdam: North Holland; 1982. pp 339–347.8. Yazıcı A, George R. Fuzzy database modeling. Heidelberg: Physica-Verlag; 1999.9. Zadeh L. Similarity relations and fuzzy orderings. Inform Sci 1971;3:177–206.

10. Buckles PB, Petry FE. A fuzzy representation of data for relational databases. Fuzzy SetSyst 1982;7:213–226.

11. Prade H, Testemale C. Representation of soft constraints and fuzzy attribute values bymeans of possibility distributions in databases. In: James Bezdek, editor. Analysis of FuzzyInformation: Vol. II, Artificial Intelligence and Decision Systems. Boca Raton, FL: CRCPress; 1987. pp 213–229.

12. Rundensteiner E, Hawkes L, Bandler W. On nearness measures in fuzzy relational datamodels. Int J Approx Reason 1989;3:267–298.

13. Codd E. Further normalization of the database relational model. In: Rustin, editor. Database systems. New York: Prentice-Hall; 1972. pp 33– 64.

14. Elmasri R, Navathe SB. Fundamentals of database systems. New York: Benjamin Cum-mings Publishing Co.; 2000.

15. Shenoi S, Melton A, Fan LT. Functional dependencies and normal forms in fuzzy rela-tional database model. Inform Sci 1992;60:1–28.

16. Liu W-Y. Fuzzy data dependencies and implication of fuzzy data dependencies. Fuzzy SetSyst 1997;92:341–348.

17. Yazıcı A, Sözat MI. A complete axiomatization for fuzzy functional and multivalued depen-dencies in fuzzy database relations. Fuzzy Set Syst 2001;117:161–181.

18. Chen G, Kerre EE, Vandenbulcke J. A computational algorithm for the FFD transitiveclosure and a complete axiomatization of fuzzy functional dependence(FFD). Int J IntellSyst 1994;9:421– 439.

19. Nakata M, Murai T. Updating under integrity constraints in fuzzy databases. In: Proc SixthIEEE Conf on Fuzzy Systems (FUZZ-IEEE’97). Barcelona: IEEE; 1997. pp 713–719.

916 BAHAR AND YAZICI

Page 33: [2004] Normalization and Lossless Join

20. Yazıcı A, Sözat MI. The integrity constraints for similarity-based fuzzy relational data-bases. Int J Intell Syst 1998;13:641– 660.

21. Saxena PC, Tyagi BK. Fuzzy functional dependencies and independencies in extendedfuzzy relational database models. Fuzzy Set Syst 1995;69:65–89.

22. Chen G, Kerre EE, Vandenbulcke J. On the lossless join decomposition of relation scheme(s)in a fuzzy relational data model. In: Bilal M. Ayyub, editor. Proc ISUMA ’93, SecondInternational Symposium on Uncertainty Modeling and Analysis. Los Alamitos, CA: IEEEComputer Society Press; 1993. pp 440– 446.

23. Chen G, Kerre EE, Vandenbulcke J. The dependency preserving decomposition and a test-ing algorithm in a fuzzy relational data model. Fuzzy Set Syst 1995;72:27–37.

24. Kerre E, Zenner R, De Clauwe R. The use of fuzzy set theory in information retrieval anddatabases: A survey. J Am Soc Inform Sci 1986;37:341–345.

NORMALIZATION OF FRDBs 917