Top Banner
Extraction of TimeER Model from a Relational Database Hoang Quang, Nguyen Van Toan Hue University of Sciences Abstract. Related to the problem of temporal database design, we can design the relational target model from the TimeER model. Extraction of the TimeER model from a relational model is called reverse engineering of the relational model. Solving this problem will facilitate an upgraded temporal information system. That means we will investigate a conceptual model which is used to design the temporal relational model. This approach of the extraction is based on the characteristics of the set of attributes, the primary key, and the set of foreign keys of the relational schema in the temporal relational model. Thereby, we propose conversion algorithms relating to the identification of the components existing within the TimeER model respectively. Keywords: Database reverse engineering; Temporal database design; Temporal conceptual schema; Relational model. 1 Introduction To solve the problem of temporal conceptual schema design, the research community has developed many different temporal ER models such as TERM, RAKE, MOTAR, TEER, STEER, ERT, TimeER [6], [7]. The TimeER model (Time-Extended-EER) is constructed that extends the EER model to provide built-in support for capturing temporal aspects more sufficiently compared to other models. On that basis, we can design temporal logical data models. Related to the TimeER model, there have been proposals for the conversion methods from the TimeER model to the relational target model [1], [2], [5]. Another issue arising from this is that if we need to upgrade an information system, then we need to modify the TimeER model (the conceptual model) to match the requirements of the real world. However, suppose that we could not define the TimeER for any reason. That means we need to investigate the TimeER model which is used to design the temporal relational model (the logical model). Extraction of the conceptual model from the logical model is called reverse engineering of the logical model [9]. In addition, solving this problem will facilitate the conversion of the relational model to other database model. Especially, it is the data model for temporal XML documents. One of the techniques to extract XML documents from a relational model is that we can use the TimeER model as an intermediate conversion result. Thus, this paper will focus on the development of an algorithm to extract the TimeER model from a temporal relational model. This approach of the extraction is
10

Extraction of TimeER Model from a Relational Database

Feb 20, 2023

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Extraction of TimeER Model from a Relational Database

Extraction of TimeER Model from a Relational Database

Hoang Quang, Nguyen Van Toan Hue University of Sciences

Abstract. Related to the problem of temporal database design, we can design the relational target model from the TimeER model. Extraction of the TimeER model from a relational model is called reverse engineering of the relational model. Solving this problem will facilitate an upgraded temporal information system. That means we will investigate a conceptual model which is used to design the temporal relational model. This approach of the extraction is based on the characteristics of the set of attributes, the primary key, and the set of foreign keys of the relational schema in the temporal relational model. Thereby, we propose conversion algorithms relating to the identification of the components existing within the TimeER model respectively.

Keywords: Database reverse engineering; Temporal database design; Temporal conceptual schema; Relational model.

1 Introduction

To solve the problem of temporal conceptual schema design, the research community has developed many different temporal ER models such as TERM, RAKE, MOTAR, TEER, STEER, ERT, TimeER [6], [7]. The TimeER model (Time-Extended-EER) is constructed that extends the EER model to provide built-in support for capturing temporal aspects more sufficiently compared to other models. On that basis, we can design temporal logical data models. Related to the TimeER model, there have been proposals for the conversion methods from the TimeER model to the relational target model [1], [2], [5].

Another issue arising from this is that if we need to upgrade an information system, then we need to modify the TimeER model (the conceptual model) to match the requirements of the real world. However, suppose that we could not define the TimeER for any reason. That means we need to investigate the TimeER model which is used to design the temporal relational model (the logical model). Extraction of the conceptual model from the logical model is called reverse engineering of the logical model [9].

In addition, solving this problem will facilitate the conversion of the relational model to other database model. Especially, it is the data model for temporal XML documents. One of the techniques to extract XML documents from a relational model is that we can use the TimeER model as an intermediate conversion result.

Thus, this paper will focus on the development of an algorithm to extract the TimeER model from a temporal relational model. This approach of the extraction is

Page 2: Extraction of TimeER Model from a Relational Database

based on the characteristics of the set of attributes, primary key, and a/the set of foreign keys of the relational schema in the temporal relational model. Thereby, we propose conversion algorithms relating to the identification of the components existing within the TimeER model respectively.

This paper will be structured as follows: Section 2 will give an overview of the components of the TimeER model. Section 3 will provide a mapping algorithm to convert the TimeER model to a lexically-based relational target model. Section 3 presents the basis for the theory proposed in Section 4, which discusses the method to extract the TimeER model from a relational model. Finally, in Section 5, a conclusion and a discussion of future work will be then given.

2 An overview of the TimeER model

The TimeER model is developed from the EER model [5]. This model provides built-in support for capturing the following temporal aspects: the lifespan of an entity (denoted LS), the valid time of a fact (denoted VT), and the transaction time of an entity or a fact (denoted TT).

As defined, temporal aspects of the entities in the database can be either the lifespan (LS), or the transaction time (TT), or both the lifespan and the transaction time (LT). The temporal aspects of the attributes of an entity can be either the valid time (VT), or the transaction time (TT), or both the valid time and the transaction time (BT). Moreover, because a relationship type can be seen as an entity type or an attribute, the designer then can define the temporal aspects supported with this relationship type if necessary.

Components of the TimeER model - Entity types. An entity type is represented graphically by a rectangle, while a

weak entity type is by a double rectangle. If the lifespan, or the transaction time, or both of them of the entity type is captured, it is indicated by placing a LS, or a TT, or a LT, respectively, behind the entity type name.

- Attributes. A single-valued attribute is represented by an oval, while a multi-valued attribute is by a double oval. Different from a simple attribute, a composite attribute is represented by an oval connected directly to the components of the composite attribute.

We can give an assumption that each single-valued composite attribute is replaced with a set of simple attributes. Therefore, any attribute of an entity type or a composite attribute is the one of the following attribute types: single-valued simple attribute, or multi-valued simple attribute, or multi-valued composite attribute.

If the valid time, or the transaction time, or both of them is captured, this is indicated by placing a VT, or a TT, or a BT (BiTemporal), respectively, behind the attribute name.

- Relationship types. A relationship type is represented by a diamond. For each relationship type it can be decided by the database designer whether or not to capture the temporal aspects of the relationships of the relationship type. If some temporal

Page 3: Extraction of TimeER Model from a Relational Database

aspect is captured for a relationship type we term it temporal; otherwise, it is called non-temporal.

- Superclass/subclass relations. As in the EER model, the TimeER model offers support for specifying superclass/subclass relations. It is not possible to change the temporal support of the inherited attributes, but it is possible to add attributes and to further expand the inherited temporal support of the class itself.

Fig. 1. TimeER diagram of a company database [5]

3 A mapping algorithm from the TimeER model to the relational model

This section presents the mapping 7-step algorithm that transforms the components of a TimeER model to some relations, primary key constraints and foreign key constraints. The advantage of this mapping algorithm is that it can allow extension mapping for nested temporal multi-valued composite attributes of an entity type in the TimeER model [1].

Step 1. Mapping of entity types not participating in a superclass/subclass relationship

For each entity type E, not participating in a superclass/subclass relationship and having non-temporal single-valued attributes A1, A2, …, An, we consider the following cases.

a) Mapping of regular entity type: If E is the regular entity type whose key is ID(E) (we assume that |ID(E)| = 1), then create a relation R(E), called the primary relation representing the entity type E, which includes the attributes ID(E) ∪ {A1, A2, …, An}. The primary key of R(E) is ID(E).

Page 4: Extraction of TimeER Model from a Relational Database

b) Mapping of weak entity type: Let E be the weak entity type of the identifying relationship S with the owner entity type E′. It is supposed that E has the partial key X ⊂ {A1, A2, …, An}. We then create the primary relation R(E) that includes the attributes FK ∪ {A1, A2, …, An}, where FK is the foreign key referencing the relation R(E′). The primary key of R(E) is FK ∪ X.

It is assigned that the foreign keys of relations are indicated by the symbol “f.k.” following the attribute names.

With the case where E is the temporal entity type (life span/transaction time), we create a new relation called the time relation representing the entity type E, denoted TR(E), which includes the attributes FK′ ∪ T, where FK′ is the foreign key referencing the relation R(E). Note that T is the timestamp attributes depending on temporal support of E indicated by an asterix (*) in Table 1.

Table 1. Abbreviation used for temporal support of entity types and relationship types

Consider T ′ ⊂ T which is the set of underlined attributes in the above table, the

primary key of TR(E) then is FK ∪ T ′.

Figure 2. Mapping of entity types not participating in a superclass/subclass relationship

Step 2. Mapping of entity types participating in a superclass/subclass relationship

For each superclass/subclass relationship where superclass E has subclasses S1, S2, …, Sn, we create the primary relation R(E) referencing the entity type E to represent the superclass E. Suppose each subclass Si has a set of added non-temporal single-valued attributes Xi, we then create n new relations SR(Si), called sub relations representing the entity type Si, which includes the attributes FK ∪ Xi (with i = 1..n) and the primary key is FK, where FK is the foreign key referencing the relation R(E).

Page 5: Extraction of TimeER Model from a Relational Database

As in Step 1, if E or S1, S2, …, Sn are the temporal entity types, we then create the new time relations representing these entity types.

Step 3. Mapping of temporal single-valued attributes of an entity type For each temporal single-valued attribute A of E, if the temporal support of A is

indicated by an asterix (*), we create the time relation TRA(E) representing attribute A of E, which includes the attributes FK ∪ A ∪ T, where FK is the foreign key referencing the relation R(E), and T is the timestamp attributes referencing character * in Table 2.

Table 2. Abbreviation used for temporal support of attributes and relationship types

Consider T ′ ⊂ T which is the set of underlined attributes in Table 2, the primary

key of TRA(E) then is FK ∪ T ′.

Step 4. Mapping of multi-valued attributes For each multi-valued attribute A of the entity type E in PNF (Partitioned Normal

Form), or similarly, A is a multi-valued attribute of the composite attribute B, let R′ be the relation representing the entity type E (or the composite attribute B). Mapping of the multi-valued attribute A to the corresponding relation then is recursively by considering the following cases.

a) A is the simple attribute. Consider the following possibilities: • If A is the non-temporal attribute, we then create a new relation representing the

attribute A, denoted RA(E) (or RA(B)), which includes the attributes FK ∪ A′, where FK is the foreign key referencing the relation R′, and A′ is the attribute used to store values of the multi-valued attribute A, referred to as the corresponding attribute to A. The primary key of RA(E) (or RA(B)) then is FK ∪ A′.

• If A is the temporal attribute, we then create a temporal relation representing the attribute A, denoted TRA(E) (or TRA(B)), which includes the attributes FK ∪ A′ ∪ T and the primary key is FK ∪ A′ ∪ T ′, where FK is the foreign key referencing the relation R′, and A′ is the corresponding attribute to A. Besides, T and T ′ are defined similarly as in Step 3.

b) A is the composite attribute. If A is the composite attribute which has the set of non-temporal single-value attributes X and the partial key K, we then create a new relation representing the attribute A, denoted RA(E) (or RA(B)), which includes the attributes FK ∪ X and the primary key is FK ∪ K, where FK is the foreign key referencing the relation R′.

If A is the temporal attribute, we then add a temporal relation TRA(E) (or TRA(B)), which includes the attributes FK′ ∪ T and the primary key is FK′ ∪ T ′, where FK′ is

Page 6: Extraction of TimeER Model from a Relational Database

the foreign key referencing the relation RA(E) (or RA(B)). Besides, T and T ′ are defined similarly as in Step 3.

In the case where the composite attribute A has some temporal single-valued attributes, for each temporal single-value attribute C, we add a time relation TRC(A) representing the attribute C that includes the attributes FK″ ∪ C ∪ T and the primary key is FK″ ∪ T ′, where FK″ is the foreign key referencing the relation RA(E) (or RA(B)). Besides, T and T ′ are defined similarly as in Step 3.

Step 5. Mapping of non-temporal relationship types Mapping of non-temporal relationship types between entity types is performed

similarly to normal mapping (EER-to-Relational mapping [3]).

Step 6. Mapping of temporal binary relationship types without attribute Consider the temporal binary relationship type S which does not have its own

attribute and S is the relationship between E1 and E2. We then create the time relation representing the temporal binary relationship type S, noted as TR(S), which includes attributes FK1 ∪ FK2 ∪ T, where FK1 and FK2 respectively are the foreign keys referencing the relations R(E1) and R(E2). Besides, T is defined in Table 1 or Table 2, depending on the temporal support specified for the relationship type S. The primary key of TR(S) is ID(E) ∪ T ′, where T ′ ⊂ T is as defined in Table 1 or Table 2. In addition, depending on the structural constraints (min, max) on participation of entity types in S, ID(S) is defined as follows:

• If S is the relationship 1 — 1 then ID(S) = FK1 or ID(S) = FK2 • If S is the relationship 1 — many then ID(S) = FK2 • If S is the relationship many — 1 then ID(S) = FK1 • If S is the relationship many — many then ID(S) = FK1 ∪ FK2

Step 7. Mapping of temporal binary relationship types with attribute Consider the temporal binary relationship type S between two entity types E1 and

E2 with their attributes X. We then create two relations as follows: • A relation representing the binary relationship type S, denoted R(S), which

includes attributes FK1 ∪ FK2 ∪ X, where FK1 and FK2 respectively are the foreign keys referencing the relations R(E1) and R(E2). The primary key of R(S) is ID(S) defined similarly as in Step 6.

• A time relation representing the temporal aspect of the relationship type S, denoted TR(S), which includes attributes FK ∪ T and the primary key is FK ∪ T ′, where FK is the foreign key referencing the relation R(S). Besides, T is the timestamp attributes depending on the temporal support of the relationship S, where T and T ′ are also defined in Table 1 or Table 2.

In the case where the relationship S has some temporal attributes, for each temporal attribute A we create a time relation, denoted TRA(S), which includes the attributes FK′ ∪ A ∪ TA and the primary key is FK′ ∪ TA′, where FK′ is the foreign key referencing the relation R(S). Besides, TA is the timestamp attributes referencing the temporal support specified for the attribute A. TA and TA′ are described as in Table 2.

Page 7: Extraction of TimeER Model from a Relational Database

Note that mapping of other temporal relationship types which have or do not have their own attribute, such as recursive relationship type, n-ary relationship type, is performed similarly to the mapping in Step 6 or Step 7.

4 Extraction TimeER model from relational model

Algorithm of extraction of the TimeER model from a relational model is defined as follows.

Input: The temporal relational model DB is the set of relations DB in which R ∈ DB we can define the set of attributes UR, the primary key PKR, and the set of foreign keys FKR.

Output: TimeER model. The TimeER model to extract is called satisfaction of reverse engineering, if this model is used together with the algorithm considered in Section 3, we can obtain a set of relations DB.

The TimeER model to extract is assumed to be used without a weak entity type. This assumption is acceptable because we can map a weak entity type with a multi-valued composite attribute of the respective owner entity type if the weak entity type is not involved in any other relationship in the model TimeER.

Extraction algorithms are implemented through the use of rules in turn to identify the components in the TimeER model based on the characteristics of the set of attributes, the primary key, and the set of foreign keys of the relation in the temporal relational model. The components include: entity type, temporal aspects of entity type, temporal attributes of an entity type, relationship between entity types, temporal aspects of a relationship, temporal attributes and non-temporal attributes of a relationship.

The proposed rule to identify one of the components is carried out under a general principle as follows. The condition needed to build a component is identified so that it satisfies the normal conversion algorithm for that component (referred to in Section 3), but does not satisfy for the rest. This allows us to prove the correctness of these rules with the method of exclusion.

To ease the construction of identification rules, we first perform the sub-group relations in the DB as follows. DB1 is called the set of all relations R∈ DB, which does not contain the timestamp attribute. In contrast, DB2 is called the set of all relations R ∈ DB, which contains the timestamp attribute. We have: DB1 ∪ DB2 = DB.

Next, we convert each relationship type R∈ DB into a corresponding temporary entity with the same name and the same file attributes UR, denoted E(R).

Justification: According to the algorithm considered in Section 3, each type of entity in the TimeER model has a corresponding relation R containing all the non-temporal single-valued attributes of that entity. Thus, in the reverse engineering, each relation R is mapped to a temporary entity type. However, we use the term "temporary entity type" because the relations in the DB1 ∪ DB2 are those that would later be identified as the other components of the model TimeER (such as: relationships, multi-valued attributes, ...).

Method: Extraction of the TimeER model from DB1 and DB2 is then implemented through the use of algorithms in turn as follows.

Page 8: Extraction of TimeER Model from a Relational Database

Algorithm 4.1. (Extraction of the TimeER model from the relations without timestamp attribute)1

For each relation R ∈ DB1 do If: PKR ⊆ và ∃R ′∈ DB2, ∃FK ′∈ FKR′: FK ′ referencing R; U

RFKFK

FK∈

Then: R is identified to represent the temporal relationship type with attribute; Else: If: |PKR| = 1; Then: R is identified to represent the entity type; If: R has the foreign key FK ∈ FKR referencing R; Then: R is identified to represent the non-temporal recursive 1-1/1-many relationship type; Else If: PKR ∈ FKR referencing R ′∈ DB1 and R ′ ≠ R; Then: E(R) is the subclass of E(R ′); Else: If: R has the foreign key FK ≠ PKR referencing R' ∈ DB1; If: FK has unique constraint; Then: R is identified to represent the non-temporal binary 1-1 relationship type; Else: R is identified to represent the non-temporal binary 1-many relationship type; Else {| PKR | > 1}: If: U

RFKFKR FKPK

Then: R is identified to represent the non-temporal relationship type of degree n (n = 1: recursive many-many relationship type; n = 2: binary many-many relationship type; and n > 2: n-ary relationship type); Else: If: PKR ⊃ FK, FK is the only foreign key of R; Then: R is identified to represent the multi-valued attribute; Endfor;

Algorithm 4.2. (Extraction of the TimeER model from the relations with timestamp attribute) For each relation R ∈ DB2 do

If: R has the foreign key FK referencing R ′∈ DB1 so that the relation R ′ has been identified as the relation to represent the temporal relationship type with attribute; If: UR = FK ∪ T, where T is the set of timestamp attributes; Then: R is identified to represent the temporal aspects of the relationship S(R′). The temporal aspects of the relationship depend on the attribute names in T;

1 Indentation style applies to If-Then-Else and If-Then statements.

Page 9: Extraction of TimeER Model from a Relational Database

Else {UR ≠ FK ∪ T }: R is identified to represent the temporal attribute of the relationship S(R′). The temporal aspects of this attribute depend on the attribute names in T. Else {R has no foreign key referencing R'∈DB1 so that R′ has been identified as the relation to represent the temporal relationship type with attribute}: If: |FKR| > 1; Then: R is identified to represent the temporal relationship S(R ′) which does not have its own attribute. The temporal aspects of the relationship depend on the attribute names in T. Else {|FKR| ≤ 1}: If: R has the only foreign key FK ; If: UR = FK ∪ T, where T is the set of timestamp attributes; Then: R is identified to represent the temporal aspects of the temporary entity type E(R′). The temporal aspects of the temporary entity type E(R ′) depend on the attribute names in T; Else {UR ≠ FK ∪ T}: If: PKR = FK ∪ T ′, where T ′⊂ T; Then: R is identified to represent the temporal single-value attribute of E(R′). The temporal aspects of this attribute depend on the attribute names in T; Else {PKR ≠ FK ∪ T ′}: R is identified to represent the temporal multi-valued simple attribute of E(R′). The temporal aspects of this attribute depend on the attribute names in T. Endfor;

5 Conclusion

In this paper we have proposed a method of extraction of the TimeER model from a relational model which is based on the characteristics of the set of attributes, the primary key, and the set of foreign keys of the relational schema in the temporal relational model. This approach is practical for temporal relational database existing since it is based only on the metadata defined by data definition language in the relational model (CREATE TABLE and ALTER TABLE statements in SQL).

We have done the design and made successful installation of this extraction on SQL 2005 Database Management System.

However, the completeness of the rules on the algorithms in Section 4 is only fitting if we use the conversion method presented in Section 3.

The reduction of the assumptions of reverse engineering is certain to affect this conversion method by property "not only" of the conversion problem in Section 3. For example, extracting a weak entity type in the model TimeER is not taken into account. The reason is that a weak entity type is also essentially seen as a multi-valued composite attribute of the respective owner entities. Such assumptions are unavoidable, while strengthening our ability to automate the extraction algorithm. However, logically this extraction method is acceptable because we can prove that for any input of a relational database, there is a corresponding database in the TimeER model.

Page 10: Extraction of TimeER Model from a Relational Database

Our research concerns the application of extraction of the TimeER model from a relational model to perform the conversion of the temporal relational model into other models, specially Temporal XML documents by using the TimeER model as an intermediate conversion result.

References

1. Quang, H., Thanh, H.T.: Extension of Method for Converting TimeER Model to Relational Model. Journal of Computer Science and Cybernetics, vol. 25, no. 3, pp. 246--257 (2009)

2. Quang, H., Thanh, H.T.: A Mapping Algorithm from TimeER Model to Relational Model. The Second Hanoi Forum on Information — Communication Technology, Proceedings, Hanoi, December 11-13, pp. 37--45 (2008)

3. Elmasri, R., Navathe, S.B.: Fundamentals of Database Systems. Addison Wesley, 5th Edition (2007)

4. Jensen, C.S., Snodgrass, R.T.: Temporal Data Management. IEEE Transactions on Knowledge and Data Engineering, vol. 11, no. 1, pp. 36--44 (1999)

5. Jensen, C.S.: Temporal Database Management. Dr.techn. thesis, Aalborg University (2000) http://www.cs.auc.dk/~csj/Thesis/

6. Gregersen, H., Jensen, C.S.: Temporal Entity-Relationship Models - a Survey. IEEE Transactions on Knowledge and Data Engineering, Vol. 11, No. 3, pp. 464--497 (1999)

7. Gregersen, H., Jensen, C.S.: Conceptual Modeling of Time-varying Information. TIMECENTER Technical Report TR-35 (1998)

8. Torp, K., Snodgrass, R.T., Jensen, C.S.: Effective Timestamping in Databases. VLDB Journal, Vol. 8, No. 4, pp. 267--288 (2000)

9. Chiang, R.H.L., Barron, T.M., Storey, V.C.: Reverse Engineering of Relational Databases: Extraction of an EER Model from a Relational Database. Data & Knowledge Engineering, vol. 12, pp. 107--142 (1994)