- 1. Evolution of DatabaseTechnology C. Mohan , PhD IBM Fellow
& IBM India Chief Scientist Member, IBM Software Group, Asset
Architecture &Information Management Architecture Boards
http://www.almaden.ibm.com/u/mohan/ [email_address]
2. Someof Our Database Research Legacy
- Invention of Relational DBMS & SQL
-
- Starburst Extensible Object-Relational DBMS
-
- Garlic Heterogeneous DBMS
-
- Data sharing on DB2 390 Sysplex
-
- Discovery Link & DB2 Information Integrator
- 6 IBM Fellows from team of < 50
3. Why We Have Experience with Customers
- Over 2 decades of partnershipwith SWG Toronto & SVL
-
- Incorporation of Starburst prototype into DB2
-
- Component Owners of DB2 for LUWs Query Compiler
-
- Dealt with customer APARs, Visits, & Presentations
- Responsible for many DB2 innovations
-
- Query Graph Model (internal query representation, key to
extensibility)
-
- Query ReWrite and Optimizer technology
-
- ARIES transaction methods
-
- Object-relational features
-
- Automatic Summary Tables (materialized views)
-
- World-class publications in leading database conferences
-
- Cognizant of industry trends
4. Leveraging Technology and People IMS Development DB2
Development IDS / U2 Development CustomerRequirements IBM Products
IBM Research 5. SVLDB2 UDB for z/OS & OS/390 IMS Business
Intelligence Content Management DB2 Everyplace Red Brick Icing
Traditional AD Languages BoeblingenDB2 Text Extenders SAP/R3
Enablement Intelligent Miner for Data Intelligent Miner for Text
Somers Hawthorne Advanced Technology Almaden Advanced Technology
Austin GBIS PortlandXPS & DB2 Lenexa IDS Boulder & Denver
Content Management U2 Datablades Boca Raton & Miami EMMS LA
Informix Support RochesterDB2 UDB for AS/400 TorontoDB2 UDB for
UNIX,Windows, & OS/2 IBM Information Management Teams Beijing
Information Integration DB2 for zOS Content Management DB2 and IMS
tools Las Vegas Entity Analytics Over 6000 employees worldwide
Menlo Park & Oakland IDS XPS JDBC Visionary Cloudscape
Datablades Object Connect & Translator Content Management India
DB2 UDB Service Business Intelligence IDS YamatoHigh Speed Inverted
Index Search Business Intelligence Content Management
HursleyEnterprise Master Data Solutions
-
- Broad range of skills all SWG Brands
-
- Lab based services teams DB2, CM, BI
-
-
- http://www.research.ibm.com/irl/projects//
-
- Education Center for IBM Software
6. A Spectrum of Data Serving Requirements Platform: Mobile
Desktop Small Servers Large ServersData Size: MicroCompact
LargeExtremely Large Workload: BatchOnline TransactionsReal-time
AnalysisData Mining Structure: HierarchicalRelationalMulti-ValueXML
OS: SymbianPalmOSWindowsLinuxUnix(s)i5/OSz/OS Scope:
EmbeddedIntra-applicationSingle applicationMulti-application
Support: None Web/E-mail Business hours 24x7 7. Products to Match
the Spectrum of Data Serving Needs DB2 Everyplace OLTP Relational
Mobile EmbeddedLinux PalmOS Symbian Cloudscape OLTP Relational
Intra-App / Single-App Java IDS OLTP Relational Intra-App /
Single-App AIX, etc. Linux Windows DB2 OLTP & Analysis
Relational& XML Single / Multi-App z/OS I5/OS AIX, etc. Linux
Windows IMS OLTP Hierarchical Single / Multi-App z/OS U2 OLTP
Multi-Value Intra-App / Single-App AIX, etc. Linux Windows Superior
capabilities across the spectrum of requirements 8. DB2 for
z/OS
- The power and function of an open, industry standard data
serverwith zSeries industry leading availability, performance, and
security
- What it takes to be the industrys most extreme data server
- Continuous application availability measured in years
- Ability to process over1B SQLtransactions per hour
- Uninterrupted growth from 1 byte to over a peta-byte
- Serving 100s of applications for 100,000s of users
- US Governments highest security classification (zSeries)
- Support for industry standards:XML, Web services, Java, C,
COBOL
- Support for complex business applications:SAP, PeopleSoft,
Siebel
Extreme qualities of serviceXML and Relational data server 9.
Technology Evolution with Mainframe Specialty Engines Integrated
Facility for Linux (IFL) 2001 IBM System z9 Integrated Information
Processor (IBM zIIP) planned for 2006 System z9 Application Assist
Processor (zAAP) 2004
- Building on a strong track record of technology innovation with
specialty engines, IBM intends tointroduce the System z9 Integrated
Information Processor
- Supportfornewworkloadsand openstandards
- Designed to help improve resource optimization for eligible
data workloads within the enterprise
- Centralizeddata sharingacross mainframes
- Incorporation of JAVAinto existing mainframe solutions
Internal CouplingFacility (ICF) 1997 10. Data Challenges
- Variety, Velocity, and Volume
- New composite applications need data from multiple sources
-
- Consumers expect holistic, personalized, and value-added
content
-
- Relational, XML, packagedapplications, content
repositories,file systems all contain critical business
information
- Increasing emphasis on current data
-
- Business activity monitoring
- Petabytes will be the measure of available online data
-
- All client interactions are important ( e.g., instant messages,
audio records, web traffic,)
-
- Internet and intranet content
The world produces 250MB of information every year for every
man, woman and child on earth. 10-100GB 100sGB - 1TB 1 - 20 GBs
100sMB 100s KB 1999 1s TB 1s TB 100s TB 100s TB 1s TB 1s TB 10s GB
10s GB 1s GB 1s GB 2004 10X 100X 100X 1,000X 10,000X Common
Database Sizes Common Database Sizes Transactions Warehouses Marts
Mobile Pervasive 37%CGR Disk Growth 96-07 70,000 TBof TV and Radio
content in 2002 alone; 30% growth/year 11. Addressing the Changing
Characteristics of Data Actionability Heterogeneity Scale Satellite
& Surveillance Images and Video Gene Sequences Transactions
Text and Web Increasing need to manage and analyze new data types
Protein Folding 12. Key Customer Pain Points
- Cant Find Information Discovery
- Cant combine Information Integration
- Cant extract value from Information Insight
- Cant consume Information Dissemination
13. Research in Information and Interaction Drive our leadership
technologies for search, structured and unstructured information
processing and analytics, natural language processing, and
conversational and multimodal interaction, across multiple tiers of
business activities in SWG products and solutions.Foster the
exploitation of components with these leading research technologies
in IGS services offerings. CM Information Retrieval NLP Analytics
Video Analysis Conversational andMultimodal Interactions
Unstructured Information Management Information Management Database
Synthesis Information Integration Metadata Speech Recognition 14.
Worlds of Structured & Unstructured Data Come Together
Analytical Complexity Collect Store Retrieve Drill Mine ETL
Warehouse SQL OLAP Cluster, Classify, .. Crawl ECM Search Navigate
Cluster, Classify, .. Solutions II Structured Data Unstructured
Data 15. Need for Business Intelligence
Homeland Security
- Supply Chain Efficiencies
Accountability and Compliance Customer Knowledge Business
Performance
HIPAA Basel II Patriot Act Sarbanes-Oxley Capitalism and Its
Troubles: A Survey of International Finance-May 24, 2002Preparing
for terror How scared should you be? Nov 28th 2002From The
Economist print edition 16. Industry Solutions Deliver Insight On
Demand
-
- Crime Information Warehouse
- Basel II and BankingData Warehouse
- Aligned Clinical Environment
- Quality Insight Early Warning
17. OmniFind Key Technologies Content Crawling
Parsing/ Tokenizing
Search Collections Categorization
Annotation
Indexing
Searching Security 18. Content Management Portfolio Strategy
- Capture, store, and manage all forms of content
- Complete and scalable, content management functionality
-
- Digital rights management
-
- Email/Messaging archiving and management
- Enterprise-scale business process management
- Cross-portfolio, out-of-the-box integration
- Rich, common client platform
19. IBM Content Management Platform Roadmap 4Q2004 1Q2005 2005
2006 and Beyond WebSphere Portal V5.1 Embeds DB2 Content Manager
Runtime Edition (JCR) Records Manager V4.1.1 A Dynamic RM
Infrastructure Workplace Web Content Management V2.0 Leveraging DB2
Content Manager and WebSphere Portal Framework DB2 Content Manager
V8.3 Enhance Doc Routing Enable BPM Extend Integration Capabilities
Seamless RM DB2 Document Manager V8.3 Compliance/RM Extending
Native Language Support DB2 CommonStore V8.3 Full-Text Search
Seamless RM First Step ECM Unified Client New Portlets J2EE Web
Components Extend to DPM Extend Document Management Email/Messaging
Archiving and Management Enhancements Physical Records Management
Virtual Records Management WCM Leveraging Workplace and DB2 Content
Manager Runtime (JCR) Common Content Repository Workplace Unified
End-User Experience (Client) Event Framework Integrated /
Interoperable DPM/BPM Extended ECM Capabilities as Add-On Features
Enterprise JCR IBM CM SDK Enterprise Content Integration JSR170 DB2
Content Manager Runtime in ISV Applications LDDM* Fully Supports
JSR170 Autonomic Capabilities Content Preservation Content
Intelligence Pervasive Enablement and More * Lotus Domino Document
Manager 20. Query Optimization
- Industry-Leading Optimization
- Extensible SQL to XQuery!
- Powerful for complex OLAP & BI queries
- Industry-Strength Engineering
-
- Databases of 1 GB to > 300 TB
- Continuing "technology pump" of improvements from Research
21. Unstructured Information Management Architecture
- Common Research infrastructure for advancing Text Analysis and
NLP capability
-
- Promotes re-use of best-of-breed components
-
- Promotes combination hypothesis through ease of
integration
UnstructuredInformation Application Libraries Specialized
Application Libraries Provide basic functions common to a broad
class of application libraries & applications (e.g. Glossary
Extraction Taxonomy Generation, Classification, Translation, etc.)
Question Answeringe-Commerce Semantic Search Engine Token and
Concept Indexing Query Key words, concepts, spans, ranges ->
Ranked Hit List National & Intelligence Business Bioinformatics
Technical Support Document & Meta Data Store Documents with
meta data based on key-value pairs Enables view & collection
management (Text) Analysis Engine (TAEs) Combination of analysis
engines employing a variety of analytical techniques and strategies
Structured Knowledge Access Knowledge Source Adapters - (KSAs)
deliver content from many structured knowledge sources according to
central ontologies Collection Processing Manager KSA Directory
Service Dynamic query & delivery of KSAs TAE Directory Service
Dynamic query & delivery ofTAEs UIMA Standard Application
Libraries Relevant Application Knowledge StructuredData UIM
Solutions 22. Analyticsbridge theUnstructured & Structured
worlds Unstructured Information UIMA High-Value Most Current
Content Fastest Growing BUT ... Buried in Huge Volumes Lots of
Noise Implicit Semantics Inefficient Search Explicit Structure
Explicit Semantics Efficient Search Focused Content Text , Chat,
Email, Audio, Video Indices DBs KBs
- Identify Semantic Entities, Induce Structure
-
- Chats, Phone Calls, Transfers
-
- People, Places, Org, Events
-
- Times, Topics, Opinions, Relationships
UIMA - The Big PictureStructured Information 23. Evolution of
Metadata Hierarchical Data ModelRigid Metadata Single Application
Domain Specific Ontologies Flexible Metadata Cross Industry
Integration Increased Business Value of Metadata
Syntacticannotation of data: what this data represents
Semanticannotations of data: what this data means Relational Data
Model Rigid Metadata Integration Within Enterprise Extensible Data
Model (XML) Flexible Metadata Integration Within Industry 1970 1990
2000 2010 1980 24. Information Management Trends
- Information Intensive Applications
-
- Shift from transaction-centric to information-intensive
applications
-
- Delivering insight over increasingly diverse sources of
information
- New Business & Delivery Models
-
- Information as a Service, Outsourcing, New Licensing
Models
- Democratization of Information
-
- Changing User Expectations & the Parent Test
- Massive Collaboration & Societal Intelligence
-
- Collaboration over shared information to creating business
insight
25.
- STEM is a tool to help scientists and public health officials
create and test models for emerging infectious diseases.
-
- Understand disease dynamics
-
- Test outcomes of preventative actions
-
- GIS data for every county borders, populations, shared borders,
highways, airports
-
- Susceptible/Infectious/Recovered (SIR) models
-
- Susceptible/Exposed/Infectious/Recovered (SEIR) models
-
- Multi-serotype disease models
-
- Public health policy events
-
- User specified disease vectors
Spatiotemporal Epidemiological Modeler
http://www.alphaworks.ibm.com/tech/stem 26. Metadata-driven Design
for Integration Web Service Build These Using These New Business
Process New Integrated View Legacy and packaged apps Relational
databases XML documents New DataFlow WBI II ETL 40% of IT budgets
may be spent on integration 30% of peoples time is searching for
relevant information 30% of development time is copy management
- Remember relationships and dependencies
- Find and visualize related information
- Generate the integration glue
27. Metadata Will Be Used to Facilitate Information and
Application Integration
- Today manual integration, custom hard-wired integration
- Tomorrow semi-automated integration by using tools and
connectors
- Future automated integration through metadata standards and
tools
28.