Top Banner
Digital Worlds (applications) VEC (Enterprise Scale) 1,300 source databases 10+ million views (via data integration) US Healthcare (National Scale) Scale o Health care and social assistance offices: 784,626 incl Doctors offices: 220,131 Dentists: 127,057 Hospitals: 6,505 Clinics: ~5,000 ~= SME say 100 Databases o Patients: 100-300+ million o Databases: ~32 million Scope o Comprehensive medical events, methods, analysis, … E.g., Alice (62) in Emergency Room with liver failure o Insurance, payments, … o New metric: healthcare quality Examples o SHRINE (2009): 3 hospitals; uses 2,381,883 distinct concepts (ontologies) o HHS CIO (Todd Park): Open Health Data Initiative o US (PCAST, White House) vision
16

STI Summit 2011 - Digital Worlds

Nov 10, 2014

Download

Documents

 
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 1. Digital Worlds (applications) q VEC (Enterprise Scale) 1,300 source databases 10+ million views (via data integration) q US Healthcare (National Scale) Scale o Health care and social assistance offices: 784,626 incl Doctors offices: 220,131 Dentists: 127,057 Hospitals: 6,505 Clinics: ~5,000 ~= SME say 100 Databases o Patients: 100-300+ million o Databases: ~32 million Scope o Comprehensive medical events, methods, analysis, E.g., Alice (62) in Emergency Room with liver failure o Insurance, payments, o New metric: healthcare quality Examples o SHRINE (2009): 3 hospitals; uses 2,381,883 distinct concepts (ontologies) o HHS CIO (Todd Park): Open Health Data Initiative o US (PCAST, White House) vision
  • 2. Observations q Data Sources Massive o Number o Heterogeneity o Distribution (data at source) o Constant change data, model, ontology, business rules, Constrained o Governance: privacy, confidentiality, legal, o Quality, correctness, precision, o Competition q Critical Requirement: meaningful Human lives Health of individuals, communities, nation Economic impact: $ trillions / year Political: meaningless debates
  • 3. Trendsq Digital Universeq Holistic Views Information Ecosystems: data Ecosystems: Processes over servicesq Big Data: massive o Number o Distribution o Heterogeneity Semantics Structure: relational databases, X databases, web, deep web Technology: databases, data warehouses, files, q New Models: problem solving, data, Data-driven Social computing: data as social artifacts Science: Wolfram Alpha Pragmatics: Driven by healthcare quality improvement
  • 4. Databases and AI: The Twain Just Met q Database World Engineering (RDBMSs) @ scale Reasoning: Relational model (FoL) q AI World Reasoning: more powerful & expressive Engineering: in the small q Digital Universe, e.g., Web Reasoning: beyond the RDM & AI? Engineering: way beyond RDBMS q Information ecosystems Databases: join Web: link Power Law of Data The value of a data element is proportional to the number of its meaningful uses.
  • 5. What Underlies the Digital Universe Modelling Execution Data Models DBMS Engines Languages Algorithms Semantics Semantics Problem Solving Computation
  • 6. What Underlies the Data Universe Relational Data Independence RDBMS Data Model Semantics SemanticsProblem Solving Computation
  • 7. Relational Database Improvements q Pre-Relational Hierarchical Network q Relational Row store OLAP / Data Warehouse q Post-Relational RDF store Column store Bare bones relational Stream / complex event processing q Push Down Database / data warehouse appliances (20+ on the market) In-database analytics, (10+ on the market)
  • 8. Data Models For New Domains Must HonorData Independence q Array (Matrix)-store (SciDB) [Linear algebra] q XML databases: structured content, information exchange q Content management: e.g., Sharepoint q Graph/network store: social networking (Facebook), link analysis q Protein store: protein folding, drug discovery, q Geospatial / map store: location-based applications q Time series: signal processing, statistical and financial analysis q Cloud / Mesh data (NoSQL) stores: web scale applications q and they just keep coming
  • 9. Data Universe Database Universe Relational Data Universe
  • 10. Data Universe Graph- Network Time Data Series Scientific Model Data Data Model Model DBU Geo- Spatial RDM Data Model Document Data Digital Model Media ETC. Data ETC. ETC. Model
  • 11. Data Universe Graph- Network Time Data Series Scientific Model Data Data Model Model DBU Geo- Spatial RDM Data Model Document Data Digital Model Media ETC. Data ETC. ETC. Model
  • 12. Data Integration Solution Space:Data Independence Required Computation Problem Solving Databases Relational Optimal 4 homogeneous Optimal 4 pure relational data relational data Domain-specific Emerging Emerging Semantic Technologies (AI) Knowledge Representation Minimal Powerful Ontologies Minimal Powerful Semantic Web Modest / emerging Modest / emerging Semantic Data Management Emerging Emerging Architectural Information-As-A-Service Emerging Emerging Cloud Emerging N/A
  • 13. Databases vs. Semantic Web Discrete Worlds Heterogeneous WorldsSingle Versions of Truth Multiple Truths Data Models LOD Models? Mathematical Logic What Logic ? 1,000s of databasesProbabilistic / Eventual Common Sense Reasoning Reasoning? DI: Relational Join DI: Evidence Gathering Databases Semantic Web
  • 14. Databases vs. Web Web Explora2on Mul2ple versions of truth . . . Analysis / BI Evidence Gathering Data Warehouses Scale . . . Seman+cally Heterogeneous Views Single versions Data Management of truth . . . Seman+cally Homogeneous Databases
  • 15. Data Integration q Query: define the result Entity Computation q Find candidate data sets: search Hard q Extract, Transform, and Load (ETL): engineering q Data Integration Entity resolution Harder Integration computation
  • 16. Managing Data @ Scale I q Introduction Michael L. Brodie q Global Data Integration and Global Data Mining Chris Bizer q DB vs RDF: structure vs correlation Peter Boncz