This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
Outline1. Data Management Overview2. Data Management Tools Overview3. Data Technology Architecture4. CASE Tools5. Repositories6. Profiling/Discovery Tools7. Data Quality Engineering Tools8. Data Life Cycle9. Other Technologies10.Q&A
5
Tweeting now: #dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
Outline1. Data Management Overview2. Data Management Tools Overview3. Data Technology Architecture4. CASE Tools5. Repositories6. Profiling/Discovery Tools7. Data Quality Engineering Tools8. Data Life Cycle9. Other Technologies10.Q&A
12
Tweeting now: #dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
Understanding Data Technology RequirementsNeed to understand:• How the technology works• How it provides value in the context of a particular
business• Requirements of a data technology before determining
what technical solution to choose for a particular situation
Suggested questions:• What problem does this data technology mean to solve?• What sets this data technology apart from others?• Are there specific hardware/software/operating systems/
storage/network/connectivity requirements?• Does this technology include data security functionality?
Outline1. Data Management Overview2. Data Management Tools Overview3. Data Technology Architecture4. CASE Tools5. Repositories6. Profiling/Discovery Tools7. Data Quality Engineering Tools8. Data Life Cycle9. Other Technologies10.Q&A
19
Tweeting now: #dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
Data Technology Architecture, cont’dData technologies to be included in the technology architecture:• Database management systems (DBMS) software• Related database management utilities• Data modeling and model management software• Business intelligence software for reporting and analysis• Extract-transform-load (ETL) and other data integration
tools• Data quality analysis and data cleansing tools• Metadata management software, including metadata
• The technology roadmap for the organization consists of technology objectives as well as reviewed, approved, and published technology architecture components
• This strategic roadmap can be used to inform and direct future data technology research and project work
Outline1. Data Management Overview2. Data Management Tools Overview3. Data Technology Architecture4. CASE Tools5. Repositories6. Profiling/Discovery Tools7. Data Quality Engineering Tools8. Data Life Cycle9. Other Technologies10.Q&A
25
Tweeting now: #dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
Computer Aided Software/Systems Engineering Tools• Scientific application of a set of tools and methods
to a software system which is meant to result in high-quality, defect free, and maintainable software products
• Refers to methods for the development of information systems together with automated tools that can be used in the software development process
• CASE functions include analysis, design, and programming
26
Source: http://en.wikipedia.org/wiki/
Computer-aided software engineering (CASE) is the scientific application of a set of tools and methods to a software system which is meant to result in high-quality, defect-free, and maintainable software products. It also refers to methods for the development of information systems together with automated tools that can be used in the software development process.
CASE Tools
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
$150K = in-house support $ 55K = hardware and software maintenance $ 60K = ongoing training and misc. $265K = annual additional investment × 5 years $1325K investment over 5 years
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
Outline1. Data Management Overview2. Data Management Tools Overview3. Data Technology Architecture4. CASE Tools5. Repositories6. Profiling/Discovery Tools7. Data Quality Engineering Tools8. Data Life Cycle9. Other Technologies10.Q&A
32
Tweeting now: #dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
Repositories have been difficult to "sell"21 September 1999Michael Blechar, Lisa Wallace Management Summary Most executive and IS managers view an IT metadata repository as
an esoteric technology that is not directly related to the business. However, as will be seen, an IT metadata repository can substantially help IS organizations support the applications, which in turn support the business. An IT metadata repository is a pre-built system and reference database where the IS organizations can track and manage the information about the applications and databases they build and maintain; think of it as the inventory and change impact reporting system for IS. These repositories track metadata such as the descriptions of jobs, programs, modules, screens, data and databases, and the interrelationships between them. Metadata differs from the actual data being described. Metadata is information about data. For example, the metadata descriptions in the repository tell one that the field "customer number" appears in Databases A, B and F ...
33
[From gartner.com]
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
"However, due to cost (these tools start at about $150,000, but frequently exceed $1 million) and being slow to market in terms of support for new service-oriented architectures (SOAs), CA and ASG have opened the door to smaller competitors"
36
Metadata Repositories 2004
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
IBM AD/CycleBusiness Goals ModelDefines the mission of the enterprise, its long-range goals, and the business policies and assumptions that affect its operations.Business Rules ModelRecords rules that govern the operation of the business and the Business Events that trigger execution of Business Processes.
Enterprise Structure ModelDefines the scope of the enterprise to be modeled. Assigns a name to the model that serves to qualify each component of the model.
Extension Support ModelProvides for tactical Information Model extensions to support special tool needs.
Info Usage ModelSpecifies which of the Entity-Relationship Model component instances are used by other Information Model components.
Global Text ModelSupports recording of extended descriptive text for many of the Information Model components.
DB2 ModelRefines the definition of a Relational Database design to a DB2-specific design.
IMS Structures ModelDefines the component structures and elements and the application program views of an IMS Database.
Flow ModelSpecifies which of the Entity Relationship Model component instances are passed between Process Model components.
Applications Structure ModelDefines the overall scope of an automated Business Application, the components of the application and how they fit together.
Data Structures ModelDefines the data structures and their elements used in an automated Business Application.
Application Build ModelDefines the tools, parameters and environment required to build an automated Business Application.
Derivations/Constraints ModelRecords the rules for deriving legal values for instances of Entity-Relationship Model components, and for controlling the use or existence of E-R instance.
Entity-Relationship ModelDefines the Business Entities, their properties (attributes) and the relationships they have with other Business Entities.
Organization/Location ModelRecords the organization structure and location definitions for use in describing the enterprise.
Process ModelDefines Business Processes, their sub processes and components.
Relational Database ModelDescribes the components of a Relational Database design in terms common to all SAA relational DBMSs.
Test ModelIdentifies the various file (test procedures, test cases, etc.) affiliated with an automated business Application for use in testing that application.
Library ModelRecords the existence of non-repository files and the role they play in defining and building an automated Business Application.
Panel/Screen ModelIdentifies the Panels and Screens and the fields they contain as elements used in an automated Business Application.
Program Elements ModelIdentifies the various pieces and elements of application program source that serve as input to the application build process.
Value Domain ModelDefines the data characteristics and allowed values for information items.
Strategy ModelRecords business strategies to resolve problems, address goals, and take advantage of business opportunities. It also records the actions and steps to be taken.Resource/Problem Model
Identifies the problems and needs of the enterprise, the projects designed to address those needs, and the resources required.
Process Model
Extension Support Model
Application Structure
Model
DB2 Model
Relational Database
Model
Global Text Model
Strategy Model
Derivations/ Constriants
Model
Application Build Model
Test Model Panel/ Screen Model
IMS Structure Model
Data Structure
Model
Program Elements
Model
Business ModelGoals
Organization/ LocationModel
Resource/ Problem
Model
Enterprise Structure
Model
Entity- Relationship
Model
Info Usage Model
Value Domain Model
Flow Model
Business Rules Model
LibraryModel
IBM's AD/Cycle Information Model
37
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
Outline1. Data Management Overview2. Data Management Tools Overview3. Data Technology Architecture4. CASE Tools5. Repositories6. Profiling/Discovery Tools7. Data Quality Engineering Tools8. Data Life Cycle9. Other Technologies10.Q&A
39
Tweeting now: #dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
• Data analysis software technologies deliver up to 10X productivity over manual approaches
• Based on a powerful computing technology that allows data engineers to quickly form candidate hypotheses with respect to the existing data structures
• Hypotheses are then presented to the SMEs (both business and technical) who confirm, refine, or deny them
• Allows existing data structures to be inferred at rate that is an order of magnitude more effective than previous manual approaches
• Pioneers include Evoke->CSI, Metagenix->Ascential->IBM, Sypherlink
Outline1. Data Management Overview2. Data Management Tools Overview3. Data Technology Architecture4. CASE Tools5. Repositories6. Profiling/Discovery Tools7. Data Quality Engineering Tools8. Data Life Cycle9. Other Technologies10.Q&A
43
Tweeting now: #dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
• Upon identification of data errors, trigger data rules to transform the flawed data
• Perform standardization and guide rule-based transformations by mapping data values in their original formats and patterns into a target representation
• Parsed components of a pattern are subjected to rearrangement, corrections, or any changes as directed by the rules in the knowledge base
46
DQ Tools: (4) Identify Resolution
& Matching2 basic approaches to matching:• Deterministic
– Relies on defined patterns and rules for assigning weights and scores to determine similarity
– Predictable– Only as good as anticipations of the
rules developers• Probabilistic
– Relies on statistical techniques for assessing the probability that any pair of record represents the same entity
– Not reliant on rules– Probabilities can be refined based on
experience -> matchers can improve precision as more data is analyzed
Outline1. Data Management Overview2. Data Management Tools Overview3. Data Technology Architecture4. CASE Tools5. Repositories6. Profiling/Discovery Tools7. Data Quality Engineering Tools8. Data Life Cycle9. Other Technologies10.Q&A
48
Tweeting now: #dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
Outline1. Data Management Overview2. Data Management Tools Overview3. Data Technology Architecture4. CASE Tools5. Repositories6. Profiling/Discovery Tools7. Data Quality Engineering Tools8. Data Life Cycle9. Other Technologies10.Q&A
52
Tweeting now: #dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
Other Technologies Data Integration Definition:• Pulling together and reconciling dispersed data for
analytic purposes that organizations have maintained in multiple, heterogeneous systems. Data needs to be accessed and extracted, moved and loaded, validated and cleaned, standardized and transformed.
• Other tools include:– Servers
– EII technologies
– Portals
– Conversion tools
53
Source: http://www.information-management.com
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
Top 10 Strategic Tech Trends in 20131. Mobile device Battles- By 2013 mobile phones will overtake
PCs as the most common Web access device worldwide.
2. Mobile Applications and HTML5- For the next few years, no single tool will be optimal for all types of mobile application so expect to employ several.
3. Personal Cloud- The personal cloud will gradually replace the PC as the location where individuals keep their personal content.
4. Enterprise APP Stores- Enterprises face a complex app store future as some vendors will limit their stores to specific devices and types of apps forcing the enterprise to deal with multiple stores.
5. The Internet of Things- The Internet of Things (IoT) is a concept that describes how the Internet will expand as physical items such as consumer devices and physical assets are connected to the Internet.
6. Hybrid IT and Cloud Computing- As staffs have been asked to do more with less, IT departments must play multiple roles in coordinating IT-related activities, and cloud computing is now pushing that change to another level.
7. Strategic Big Data- Big Data is moving from a focus on individual projects to an influence on enterprises’ strategic information architecture.
8. Actionable Analytics- Analytics is increasingly delivered to users at the point of action and in context.
9. In Memory Computing- In memory computing (IMC) can also provide transformational opportunities.
10.Integrated Ecosystems- The market is undergoing a shift to more integrated systems and ecosystems and away from loosely coupled heterogeneous approaches.
[Adapted from Terry Lanham Designing Innovative Enterprise Portals and Implementing Them Into Your Content Strategies Lockheed Martin’s Compelling Case Study Web Content II: Leveraging Best-of-Breed Content Strategies - San Francisco, CA 23 January 2001]
60
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
• Data extraction and conversion software solutions for transforming complex, unstructured data formats into XML for Enterprise Application Integration – RTF
– HTML
– HL7
– Positional (Offset-Based) reports
– TAB-delimited and other delimited reports
– EDI
• Binary documents are automatically converted to a suitable text for parsing for:– Microsoft Word documents
– Microsoft Excel documents
– PDF documents
– COBOL programs
Tamino
BizTalk
http://www.itemfield.com/
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
Outline1. Data Management Overview2. Data Management Tools Overview3. Data Technology Architecture4. CASE Tools5. Repositories6. Profiling/Discovery Tools7. Data Quality Engineering Tools8. Data Life Cycle9. Other Technologies10.Q&A
67
Tweeting now: #dataed
TITLE
PRODUCED BYDATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060