Top Banner
© Pearson Education Limit ed, 2004 1 Chapter 19 Current and Emerging Trends Transparencies
85

© Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

1

Chapter 19

Current and Emerging Trends Transparencies

Page 2: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

2

Chapter 19 – Objectives

Requirements for advanced database applications.

Why RDBMSs currently not well suited to supporting these.

Main concepts of DDBMSs. Main concepts of database replication. Main concepts of OODBMSs and

ORDBMSs. Main concepts of data warehousing. Main concepts of OLAP and data mining. Approaches for integrating databases

into the web environment.

Page 3: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

3

Advanced Database Applications

Computer-Aided Design (CAD) Computer-Aided Manufacturing (CAM) Office Information Systems (OIS) and

Multimedia Systems Geographic Information Systems (GIS) Interactive and Dynamic Web sites.

Page 4: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

4

Advanced Database Applications

Computer-Aided Design (CAD) Stores data relating to mechanical and

electrical design, for example, buildings, airplanes, and integrated circuit chips.

Designs of this type have some common characteristics: Data has many types, each with a small

number of instances. Designs may be very large.

Page 5: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

5

Advanced Database Applications

Design is not static but evolves through time.

Updates are far-reaching. Involves version control and

configuration management. Cooperative engineering.

Computer-Aided Manufacturing (CAM) Stores similar data to CAD, plus data

about discrete production.

Page 6: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

6

Office Information Systems (OIS) and Multimedia Systems

Stores data relating to computer control of information in a business, including electronic mail, documents, invoices, etc.

Modern systems now handle free-form text, photographs, diagrams, audio and video sequences.

Documents may have specific structure, perhaps described using mark-up language such as SGML, HTML, or XML.

Page 7: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

7

Geographic Information Systems (GIS)

GIS database stores spatial and temporal information, such as that used in land management and underwater exploration.

Much of data is derived from survey and satellite photographs, and tends to be very large.

Searches may involve identifying features based on shape, color, texture, using advanced pattern-recognition techniques.

Page 8: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

8

Interactive and Dynamic Web Sites

Consider online catalog for selling clothes. Web site maintains preferences for previous visitors to site and allows visitor to: obtain 3D rendering of any item based

on color, size, fabric, etc.; modify rendering to account for

movement, illumination, backdrop, occasion, etc.;

select accessories to go with the outfit, from items presented in a sidebar;

Page 9: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

9

Interactive and Dynamic Web Sites

Need to handle multimedia content and to interactively modify display based on user preferences and user selections.

Also have added complexity of providing 3D rendering.

Page 10: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

10

Weaknesses of RDBMSs

Poor Representation of “Real World” Entities Normalization leads to relations that do not

correspond to entities in “real world”.

Semantic Overloading Relational model has only one construct for

representing data and data relationships: the table.

Relational model is semantically overloaded.

Page 11: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

11

Weaknesses of RDBMSs

Poor Support for Business Rules Limited Operations

RDBMSs only have a fixed set of operations which cannot be extended.

Difficulty Handling Recursive Queries Extremely difficult to produce recursive

queries. Extension proposed to relational algebra

to handle this type of query is unary transitive (recursive) closure operation.

Page 12: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

12

Weaknesses of RDBMSs

Impedance Mismatch Most DMLs lack computational

completeness. To overcome this, SQL can be

embedded in a high-level 3GL. This produces an impedance mismatch -

mixing different programming paradigms.

Page 13: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

13

DDBMSs - Concepts

Distributed DatabaseA logically interrelated collection of shared data (and a description of this data), physically distributed over a computer network.

Distributed DBMSSoftware system that permits the management of the distributed database and makes the distribution transparent to users.

Page 14: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

14

DDBMSs- Concepts Collection of logically-related shared data. Data split into fragments. Fragments may be replicated. Fragments/replicas allocated to sites. Sites linked by a communications network. Data at each site is under control of a

DBMS. DBMSs handle local appns autonomously. Each DBMS participates in at least one

global appn.

Page 15: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

15

DDBMS

Page 16: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

16

Distributed Processing

Centralized database that can be accessed over a computer network.

Page 17: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

17

Advantages of DDBMSs

Reflects organizational structure Improved shareability and local

autonomy Improved availability Improved reliability Improved performance Economics Modular growth

Page 18: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

18

Disadvantages of DDBMSs

Complexity Cost Security Integrity control more difficult Lack of standards Lack of experience Database design more complex

Page 19: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

19

Replication Servers

Replication Process of generating and reproducing multiple copies of data at one or more sites.

Provides users with access to current data where and when they need it.

Provides number of benefits, including improved performance when centralized resources get overloaded, increased reliability and data availability, and support for mobile computing and data warehousing.

Page 20: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

20

Synch vs Asynch Replication

Synchronous – updates to replicated data are part of enclosing transaction. If one or more sites that hold replicas are

unavailable transaction cannot complete. Large number of messages required to

coordinate synchronization.

Asynchronous - target database updated after source database modified. Delay in regaining consistency may range

from few seconds to several hours or even days.

Page 21: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

21

Replication - Functionality

At basic level, has to be able to copy data from one database to another (synch. or asynch.).

Other functions include: Scalability. Mapping and Transformation. Object Replication. Specification of Replication Schema. Subscription mechanism. Initialization mechanism.

Page 22: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

22

Replication - Data Ownership

Ownership relates to which site has privilege to update the data.

Main types of ownership are: Master/slave (or asymmetric

replication), Workflow, Update-anywhere (or peer-to-peer or

symmetric replication).

Page 23: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

23

Replication - Master/Slave Ownership

Asynchronously replicated data is owned by one (master) site, and can be updated by only that site.

Using ‘publish-and-subscribe’ metaphor, master site makes data available.

Other sites ‘subscribe’ to data owned by master site, receiving read-only copies.

Potentially, each site can be master site for non-overlapping data sets, but update conflicts cannot occur.

Page 24: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

24

Replication - Workflow Ownership

Avoids update conflicts, while providing more dynamic ownership model.

Allows right to update replicated data to move from site to site.

However, at any one moment, only ever one site that may update that particular data.

Example is order processing system, which follows steps, such as order entry, credit approval, invoicing, shipping, and so on.

Page 25: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

25

Replication - Update-Anywhere Ownership

Creates peer-to-peer environment where multiple sites have equal rights to update replicated data.

Allows local sites to function autonomously, even when other sites are not available.

Shared ownership can lead to conflict scenarios and have to detect conflict and resolve it.

Page 26: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

26

OODBMSs

No one agreed object data model. One definition:

Object-Oriented Data Model (OODM) Data model that captures semantics of

objects supported in object-oriented programming.

Object-Oriented Database (OODB) Persistent and sharable collection of

objects defined by an OODM.Object-Oriented DBMS (OODBMS)

Manager of an OODB.

Page 27: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

27

Origins of the OODM

Page 28: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

28

Advantages of OODBMSs

Enriched Modeling Capabilities. Extensibility. Removal of Impedance Mismatch. More Expressive Query Language. Support for Schema Evolution. Support for Long Duration Transactions. Applicability to Advanced Database

Applications. Improved Performance.

Page 29: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

29

Disadvantages of OODBMSs

Lack of Experience. Lack of Standards. Competition from RDBMSs. Complexity. Lack of Support for Views. Lack of Support for Security.

Page 30: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

30

ORDBMSs

Vendors of RDBMSs conscious of threat and promise of OODBMS.

Agree that RDBMSs not currently suited to advanced database applications, and added functionality is required.

Reject claim that ORDBMSs will not provide sufficient functionality or will be too slow to cope adequately with new complexity.

Can remedy shortcomings of relational model by extending model with OO features.

Page 31: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

31

ORDBMSs - Features

OO features being added include: user-extensible types, encapsulation, inheritance, polymorphism, dynamic binding of methods, complex objects including non-1NF

objects, object identity.

Page 32: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

32

ORDBMSs - Features

However, no single extended relational model.

All models: share basic relational tables and query

language, all have some concept of ‘object’, some can store methods (or procedures

or triggers).

Page 33: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

33

Advantages of ORDBMSs

Resolves many of known weaknesses of RDBMS.

Reuse and sharing: reuse comes from ability to extend server

to perform standard functionality centrally; gives rise to increased productivity both

for developer and end-user. Preserves significant body of

knowledge and experience gone into developing relational applications.

Page 34: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

34

Disadvantages of ORDBMSs

Complexity. Increased costs. Proponents of relational approach believe

simplicity and purity of relational model are lost.

Some believe RDBMS is being extended for what will be a minority of applications.

OO purists not attracted by extensions either.

Page 35: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

35

Evolution of Data Warehousing

Since 1970s, organizations gained competitive advantage through systems that automate business processes to offer more efficient and cost-effective services to customer.

This resulted in accumulation of growing amounts of data in operational databases.

Now focus on ways to use operational data to support decision-making, as a means of gaining competitive advantage.

Page 36: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

36

Evolution of Data Warehousing

Operational systems were never designed to support such business activities, so using such systems may not be easy solution.

Businesses typically have numerous operational systems with overlapping and sometimes contradictory definitions (such as data types).

Challenge is to turn archives of data into a source of knowledge, so that a single integrated/consolidated view of organization’s data is presented to user.

Page 37: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

37

The Evolution of Data Warehousing

Data warehouse was deemed the solution to meet the requirements of a system capable of supporting decision-making, receiving data from multiple operational data sources.

Page 38: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

38

Data Warehousing Concepts

Consolidated/integrated view of corporate data drawn from disparate operational data sources and a range of end-user access tools capable of supporting simple to highly complex queries to support decision-making.

Data described as being a subject-oriented, integrated, time-variant, and non-volatile (Inmon, 1993).

Page 39: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

39

Subject-Oriented Data

Warehouse is organized around major subjects of the enterprise (e.g. customers, products, sales) rather than major application areas (e.g. customer invoicing, stock control, product sales).

This is reflected in the need to store decision-support data rather than application-oriented data.

Page 40: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

40

Integrated Data

Data warehouse integrates corporate application-oriented data from different source systems, which often includes data that is inconsistent.

Integrated data source must be made consistent to present a unified view of the data to the users.

Page 41: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

41

Time-Variant Data

Data in the warehouse is only accurate and valid at some point in time or over some time interval.

Time-variance is also shown in the extended time that data is held, the implicit or explicit association of time with all data, and the fact that the data represents a series of snapshots.

Page 42: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

42

Non-Volatile Data

Data in the warehouse is not updated in real-time but is refreshed from operational systems on a regular basis.

New data is always added as a supplement to the database, rather than a replacement.

Page 43: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

43

Typical Architecture of a DW

Page 44: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

44

Typical Architecture of a DW

Operational data: Supplied from mainframes, proprietary

file systems, private workstations and servers, and external systems such as the Internet.

Operational data store (ODS): Repository of current and integrated

operational data used for analysis. Often structured and supplied with data

in the same way as the data warehouse. May act simply as a staging area for data

to be moved into the warehouse.

Page 45: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

45

Typical Architecture of a DW

Load Manager: Performs all operations associated with

extraction and loading of data into warehouse.

Warehouse Manager Performs all operations associated with

management of data in the warehouse, such as merging data sources.

Query Manager Performs all associated with management

of user queries.

Page 46: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

46

Typical Architecture of a DW

Detailed data: Not stored online but made available by

summarizing data to the next level of detail. However, detailed data regularly added to

warehouse to supplement summarized data. Lightly and highly summarized data:

Predefined and generated by warehouse manager and stored in warehouse.

Purpose is to speed up performance of queries.

Updated continuously as new data is loaded into the warehouse.

Page 47: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

47

Typical Architecture of a DW

Meta-data (data about data): Used by all processes in the warehouse.

End-user access tools: Principal purpose of data warehousing

is to provide information to business users for strategic decision-making.

Users interact with warehouse using end-user access tools.

Warehouse must efficiently support ad hoc and routine analysis.

Includes EIS, OLAP and data mining tools.

Page 48: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

48

Data Mart

Subset of data warehouse that supports requirements of particular department or business function.

Characteristics include: Holds subset of data in warehouse in summary

form. Focuses on requirements of one department or

business function. Can be stand-alone or linked to warehouse. Popular because less complex than warehouse.

Page 49: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

49

Architecture of a Data Mart

Can be two-tier or three-tier database applications:

Data warehouse is the optional first tier.

Data mart is the second tier. End-user workstation is the third

tier. Data is distributed among tiers.

Page 50: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

50

Reasons for Creating a Data Mart

Give users access to data they need to analyze most often.

Provide data in a form that matches the collective view of the data by group of users in department or business area.

Improve end-user response time due to reduction in volume of data to be accessed.

Provide appropriately structured data as dictated by requirements of end-user access tools.

Page 51: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

51

Reasons for Creating a Data Mart

Simpler to build compared with establishing a corporate data warehouse.

Cost of implementation is normally less than that required to establish a data warehouse.

Potential users are more clearly defined and can be more easily targeted to obtain support for a data mart project rather than a corporate data warehouse project.

Page 52: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

52

Introducing OLAP

Dynamic synthesis, analysis, and consolidation of large volumes of multi-dimensional data.

Describes a technology that uses a multi-dimensional view of aggregate data to provide quick access to strategic information for purposes of advanced analysis.

Page 53: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

53

Introducing OLAP

Enables users to gain deeper understanding and knowledge about various aspects of their corporate data through fast, consistent, interactive access to wide variety of possible views of data.

Allows users to view corporate data in such a way that it is a better model of the true dimensionality of the enterprise.

Page 54: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

54

Introducing OLAP

Can easily answer ‘who?’ and ‘what?’ questions, however, ability to answer ‘what if?’ and ‘why?’ type questions distinguishes OLAP from general-purpose query tools.

Types of analysis ranges from basic navigation and browsing (slicing and dicing), to calculations, to more complex analyses such as time series and complex modeling.

Page 55: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

55

Examples of OLAP Applications

Page 56: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

56

OLAP Applications

Essential requirement of all OLAP applications is ability to provide users with just-in-time (JIT) information, to make effective decisions about an organization's strategic directions.

JIT information is computed data that usually reflects complex relationships and is often calculated on the fly.

Practical only if response times are consistently short and data model flexible.

Page 57: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

57

OLAP Applications Although OLAP applications are found

in widely divergent functional areas, all have following key features: multi-dimensional views of data; support for complex calculations; time intelligence.

Time intelligence is key feature of almost any analytical application as performance is almost always judged over time.

Page 58: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

58

OLAP Benefits

Increased productivity of end-users. Reduced backlog of applications

development for IT staff. Retention of organizational control over

the integrity of corporate data. Reduced query drag and network traffic

on OLTP systems or on the data warehouse.

Improved potential revenue and profitability.

Page 59: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

59

Data Mining

Process of extracting valid, previously unknown, comprehensible, and actionable information from large databases and using it to make crucial business decisions.

Involves analysis of data and use of software techniques for finding hidden and unexpected patterns and relationships in sets of data.

Page 60: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

60

Data Mining

Focus is to reveal information that is hidden and unexpected.

Patterns and relationships are identified by examining the underlying rules and features in the data.

Tends to work from the data up and most accurate results normally require large volumes of data to deliver reliable conclusions.

Page 61: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

61

Data Mining

Starts by developing an optimal representation of structure of sample data, during which time knowledge is acquired and extended to larger sets of data.

Data mining can provide huge paybacks for companies who have made a significant investment in data warehousing.

Relatively new technology, however already used in a number of industries.

Page 62: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

62

Some Applications of Data Mining

Retail / Marketing Identifying buying patterns of

customers. Finding associations among customer

demographic characteristics. Predicting response to mailing

campaigns. Market basket analysis.

Page 63: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

63

Some Applications of Data Mining

Banking Detecting patterns of fraudulent credit

card use. Identifying loyal customers. Predicting customers likely to change

their credit card affiliation. Determining credit card spending by

customer groups.

Page 64: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

64

Some Applications of Data Mining

Insurance Claims analysis. Predicting which customers will buy

new policies. Medicine

Characterizing patient behavior to predict surgery visits.

Identifying successful medical therapies for different illnesses.

Page 65: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

65

Data Mining Operations

Four main operations include: Predictive modeling. Database segmentation. Link analysis. Deviation detection.

Recognized associations between the applications and corresponding operations. e.g. Direct marketing strategies use

database segmentation.

Page 66: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

66

Data Mining Techniques

Techniques are specific implementations of the data mining operations.

Each operation has its own strengths and weaknesses.

Data mining tools sometimes offer a choice of operations to implement a technique.

Page 67: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

67

Data Mining Techniques

Criteria for selection of tool includes: Suitability for certain input data types. Transparency of the mining output. Tolerance of missing variable values. Level of accuracy possible. Ability to handle large volumes of

data.

Page 68: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

68

Web-database integration

Just over a decade after its conception in 1989, Web is arguably most popular and powerful networked information system to date.

Growth has been near exponential and it has started an information revolution that will continue through the next decade.

Now combination of the Web and databases brings many new opportunities for creating advanced database applications.

Page 69: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

69

Web-database integration

Compelling platform for delivery and dissemination of data-centric, interactive applications.

Organizations now rapidly building new database applications or reengineering existing ones to take advantage of Web as strategic platform for implementing innovative business solutions, in effect becoming Web-centric organizations.

Page 70: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

70

Static and Dynamic Web Pages

HTML/XML document stored in file is static Web page.

Content of dynamic Web page is generated each time it is accessed.

Thus, dynamic Web page can: respond to user input from browser; be customized by and for each user.

Requires hypertext to be generated by servers.

Page 71: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

71

Static and Dynamic Web Pages

Need scripts that perform conversions from different data formats into HTML/XML ‘on-the-fly’.

As a database is dynamic, changing as users create, insert, update, and delete data, then generating dynamic Web pages is a much more appropriate approach than creating static ones.

Page 72: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

72

Requirements for Web-DBMS Integration

Ability to access valuable corporate data in a secure manner.

Data- and vendor-independent connectivity to allow freedom of choice in DBMS selection.

Ability to interface to database independent of any proprietary browser or Web server.

Connectivity solution that takes advantage of all the features of an organization’s DBMS.

Page 73: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

73

Requirements for Web-DBMS Integration

Open architecture to allow interoperability with a variety of systems and technologies.

Cost-effective solution that allows for scalability, growth, and changes in strategic directions, and helps reduce applications development costs.

Support for transactions that span multiple HTTP requests.

Page 74: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

74

Requirements for Web-DBMS Integration

Support for session- and application-based authentication.

Acceptable performance. Minimal administration overhead. Set of high-level productivity tools to

allow applications to be developed, maintained, and deployed with relative ease and speed.

Page 75: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

75

Approaches to Integrating Web and DBMSs

Scripting Languages. Common Gateway Interface (CGI). HTTP Cookies. Extending the Web Server. Java, JDBC, SQLJ, Servlets, and JSP. Vendor-specific solutions such as:

Microsoft Web Solution Platform: ASP and ADO.

Oracle Internet Platform.

Page 76: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

76

XML (eXtensible Markup Language)

Most documents on Web currently stored and transmitted in HTML.

One strength of HTML is its simplicity. However, its simplicity is also one of

its weaknesses, with growing need from users who want tags to simplify some tasks and make HTML documents more attractive and dynamic.

Page 77: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

77

XML

To satisfy this demand, vendors introduced some browser-specific HTML tags, which made it difficult to develop sophisticated, widely viewable Web documents.

W3C has produced a new standard called XML, which could preserve the general application independence that makes HTML portable and powerful.

Page 78: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

78

XML

Meta-language (language for describing other languages) that enables designers to create their own customized tags to provide functionality not available with HTML.

Restricted version of SGML (Standard Generalized Markup Language), designed especially for Web documents.

Page 79: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

79

XML

Set to impact every aspect of programming including graphical interfaces, embedded systems, distributed systems, and database management.

Becoming de facto standard for data communication within software industry, and quickly replacing EDI as primary medium for data interchange among businesses.

Some analysts believe it will become language in which most documents are created and stored, both on and off Internet.

Page 80: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

80

XML and Databases

As amount of data in XML expands, there will be increasing demand to store, retrieve, and query this data.

Two main models anticipated: data-centric document-centric.

Page 81: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

81

XML – Data-centric model Fact that data is stored/transferred as

XML is incidental. In this case, data could be stored in

RDBMS, ORDBMS, or OODBMS. Oracle has completely integrated XML

into its Oracle 9i system. XML can be stored as entire documents

using data types XMLType or CLOB/BLOB or can be decomposed into its constituent elements and stored that way.

Oracle query language has been extended to permit searching of XML-based content.

Page 82: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

82

XML – Document-centric model

Documents designed for human consumption (eg. books, newspapers, email).

Data may be irregular/incomplete, and structure may change rapidly or unpredictably.

Unfortunately, RDBMSs, ORDBMSs, and OODBMSs do not handle data of this nature particularly well.

Content management systems are important tools for handling these types of documents. Underlying such a system, may now find a native XML database.

Page 83: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

83

Native XML Database

Defines (logical) data model for an XML document (as opposed to the data in that document) and stores and retrieves documents according to that model.

At a minimum, model must include elements, attributes, PCDATA, and document order.

XML document must be the unit of (logical) storage although it is not restricted by any underlying physical storage model (so traditional DBMSs are not ruled out) .

Page 84: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

84

XML – Query Languages

DBMS vendors have extended SQL to handle query of XML-based content.

Standardization of XML extensions to SQL is known as SQL/XML and initial work has been submitted to ISO and ANSI.

In addition, W3C formed an XML Query Working Group to produce: data model for XML documents, set of query operators on this model, query language based on these query

operators (called XQuery).

Page 85: © Pearson Education Limited, 20041 Chapter 19 Current and Emerging Trends Transparencies.

© Pearson Education Limited, 2004

85

XML – XQuery

Queries operate on single documents or fixed collections of documents.

Can select entire documents or subtrees of documents that match conditions based on document content and structure.

Queries can also construct new documents based on what has been selected.

Ultimately, collections of XML documents will be accessed like databases.

Web Technology is highly dynamic so expect significant developments over the next years.