Pluggable Model Cubes Analytical Workspace Redesign data brewery February 2014 Stefan Urbanek – @Stiivi
Dec 18, 2014
Pluggable ModelCubes Analytical Workspace Redesign
data brewery
February 2014Stefan Urbanek – @Stiivi
Original CubesCubes before 1.0
Model
■ single JSON or a model bundle
■ contains all model objects
■ full description required
backends
model browser✂
server
http
workspace
formatters
modules
one file or one directory bundle
one per serving:[workspace] backend=sql url=postgresql://localhost/database
Browser
SQL Snowflake Browser
Aggregation Browser
SQL Denormalized Browser MongoDB Browser Some HTTP Data
Service Browser
?
multiple backends available
Backend
■ implemented as python module with an entry point create_workspace()
■ provides Workspace and Browser workspace represents data storage
■ only one Workspace per serving only one kind of storage per serving
Requirements
Model
■ composed of multiple parts
■ external model definition provided from external source, such as analytical service
■ shared dimension descriptions only one dimension description is necessary per composed model
Backend
■ heterogenous storage multiple data stores, different types of data stores
■ different schemas in same store
■ multiple environments dev, test, production, ...
Redesign
Backend
■ “backend” are multiple objects:
!
■ better plug-in system instead of Python module
■ more flexible composition
|Browser
"Store
#Provider
Backend Objects
■ Browser – performs aggregated browsing
■ Store – maintains database connection
■ Model Provider – provides model
Note: not every kind has to be implemented
Logical Physical
physical data store(database or API)
|Browser
"Store
#Provider
∑aggregate
connectcreate model
model
cubes
dimensions
model
backend objects
Browser
Browser
■ depends on the logical model
■ implements aggregation aggregate(), values(), …
■ gets data from associated store
Logical Physical
physical data store(database or API)
|Browser
"Store
∑aggregate
model
browser
Browser Methods
■ features()
■ aggregate()
■ members()
■ facts()
■ fact()
Store
Store
■ provides database or API connection
■ might provide a model
■ slicer tool actions physical mapping validation, model from schema generation, schema from model generation, schema conversions and optimization, ...
*former backend’s “Workspace” object
*
Logical Physical
physical data store(database or API)
|Browser
"Store
connect
store
Store Methods
■ validate(cube) – does logical map to physical?
■ create(object) – create physical structure
Store is not required to implement any methods at this time. Future:
Model Provider
Model Provider
■ creates model from external source
■ might suggest store to be used
Logical
!Provider
create model
model
cubes
dimensions
model
model provider
Provider Methods
■ dimension_metadata(name,temps,locale)
■ cube_metadata(name,locale)
or
■ dimension(name,temps,locale)
■ cube(name,locale)
example backends
SQL Backend Mongo Backend Google Analytics Backend
|Snowflake Browser
"SQL Store
|Mongo Browser
"Mongo Store
|GA Browser
"GA Store
#GA Model Provider
from cubes import | AggregateBrowser, " Store !class " SQLStore(" Store): | default_browser_name = “sql_snowflake” ! def __init__(self, # **options): # initialize the store here ! def validate_cube(self, cube): return True # if valid !!class | SQLSnowflakeBrowser(| Browser): def __init__(self, model, locale): # initialize the browser ! def features(self): # return list of browser features def aggregate(self, cell, ...): # return aggregation of the cell
from slicer.ini
New Workspace
■ global object at library level
■ provides appropriate browser
■ contains run-time configuration
■ might have state persistence
*former backend Workspace is now Store
*
Future Workspace
■ caching
■ cube composition
■ … ?
Workspace Example
heterogenous environment
Workspace
Cubes
Model Providers
Stores
sales churn eventsactivations
Static Model Provider
API Model Provider
BI Data(Postgres)
BI Data 2(Mongo)
Events(API)
Workspace
Cubes
Model Providers
Stores
sales churn eventsactivations
Static Model Provider
BI Data(Postgres)
BI Data 2(Mongo)
crm sales events
[workspace] models_path: /var/lib/cubes/models ![models] crm: crm.cubesmodel sales: sales.cubesmodel events: events.cubesmodel ![datastore_bidata] type: sql url: postgresql://localhost/crm ![datastore_bidata2] type: mongo host: localhost collection: events
Conclusion
Conclusion
■ heterogenous pluggable environment
■ externally provided models
■ easier backend implementation
Cubes Home
cubes.databrewery.org
github
github.com/Stiivi/cubes
Development Documentation
cubes.databrewery.org/dev/doc/for github master HEAD