It summit data mgmt-2016.06.02-final

Post on 12-Apr-2017

309 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

Transcript

2 JUNE 2016

BUILDING A CLOUD BASED DATA WAREHOUSE

GILDAS BAH, BRENT BENSON, & RYAN FRAZIER

2

PresentersAbout HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven Culture

ImpactsWhat’s Next?

Ryan Frazier – Director, Systems Engineering and Operations

Brent Benson – Enterprise Architect

Gildas Bah – Data Analyst Engineer

3

4

Harvard Business School’s newest division, tasked with reimagining business education for the digital age

Launched in June 2014 Located in Allston, five minutes

from HBS campus Moving from start-up to enterprise

mode The teaching model sets HBX apart from many online learning options and is reflective of the HBS in-person classroom approach

What is HBX?About HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven Culture

ImpactsWhat’s Next?

5

HBX PlatformsAbout HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

HBX Online Platform HBX Live

Mainly asynchronous online business education

Engagement through student interaction in cohorts of ~400

Case-based learning with highly interactive teaching elements and peer help

WGBH studio-based virtual classroom

Synchronous audio/video with chat, polls, boards

Up to 60 global students on studio wall, hundreds or more observers

66

Building a Data Management Practice

7

Why Build a Data-Driven Culture?About HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

Enhance Outcomes• Proactively

support struggling students

• Identify challenging content

• Evaluate and improve interactive content, social engagement, and retention

Improve Effectiveness

• Scale data intensive activities like marketing, admissions, & grading

• Use data to test ideas and improve quality of decisions

Refine Pedagogy• Evaluate new

pedagogical approaches

• Optimize evaluation approaches

• Support pedagogical research activities and innovation

STUDENTS STAFF FACULTY

Foster Innovation & Continuous Improvement• Identify and evaluate innovation opportunities• Drive continuous improvement

8

Data Management Program Objectives

About HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

Integrate Data Sources into Comprehensive Data

Warehouse

Build Reports and Dashboards

Enable Self Service Ensure Data Quality and Integrity

9

Tool and Vendor SelectionAbout HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

Data Warehouse• Standard relational DB, Redshift• Chose Redshift because of scalability, performance• Aligns with AWS platform focus

ETL• Informatica, Talend• Chose Informatica because of university

relationship and myriad of plugable connectors

Reporting/Analytics• Microstrategy, Qlik, Tableau• Chose Tableau because of feature set and industry

adoption

10

Reporting Copy

HBX Data EcosystemAbout HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

Course

PlatformVer. A

MongoDB MySQL Reporting

Copy

Course Platform

Ver. B

MongoDB MySQL

Historical Data

MongoDB MySQL

Admin System

MySQL

Salesforce

Redshift Informatica

Secure Agent

NEW!

Tableau Server

Progress ODBC for MongoDB

sync

11

HBX Data Management by the Numbers

About HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

Source Systems• 35 databases• 887 tables• 5,844 fields• 109,751,902 rows

Data Warehouse• 4 Redshift clusters• 8 databases• 404 tables• 5,674 fields• 400,794,679 rows

Daily ETL Process• 300 jobs• 6,515,599 rows

* Updated 6/1/2016

12

HBX Data ModelsAbout HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

{ "_id": ObjectId("556f25ab662a9b059ea8df8b"), "tei_id": "554a607b241b5a3f0e09eefe", "course_instance_id":"556dcf55b7431f414d87f06f", "user_id" : "8701", "comments" : [ { "id" : "ce59a25a-ce69-47-c534611f7ebf", "text" : "This is a great response…, "author_id" : "6411", "date_created" : “2015-09-10”,

MySQL-Relational MongoDB-Semi-Structured

Course offeringsStudent demographics

Applications & registration

Limited course content

Course structureCourse content

Student course stateMetric (timing) data

13

ChallengesAbout HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

?

X

X

X Immature data connection support

Large object storage limitations in Redshift

Difficulty flattening complex/polymorphic data structures

14

{"_id': ObjectId("2804c514e4c20e6d"), "course_instance_id": "2804c51563c9c772", "tei_id": "241b5a14b75fac83", "user_id": "3312”, "date_created": datetime.datetime(2015, 10, 14, 10, 56, 59, 137000), "category": "timespan", "metric": {"interaction_time": 180, "is_interaction_time": True}}

{"_id": ObjectId("2804c514f92f1f64"), "course_instance_id": "2804c51563c9c772", "tei_id": "241b5a14b75fab70", "user_id": "3312”, "date_created": datetime.datetime(2015, 10, 14, 17, 11, 56, 967000), "category": "view_user_response", "metric": {"viewed_user_id": "9212"}}

Document-Structured Data Challenges

About HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven Culture

ImpactsWhat’s Next?

15

Document-Structured Data Challenges

About HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven Culture

ImpactsWhat’s Next?

{"_id": ObjectId("562666868c58dab88be84345"), "course_instance_id": "561829c02804c51563c9c772", "tei_id": "55c21b88241b5a14b75fab8d", "user_id": "3312", "state": {"answer": "The case really drove home..."}}

{"_id": ObjectId("56414b498c58dab88bf10873"), "course_instance_id": "561829c02804c51563c9c772", "tei_id": "5639ef402804c509af1d2721", "user_id": "3312", "state": {"summary": [{"content": "Incorrect: Being quick to market...", "correct": False, "id": "5bc32d05-1173-452c-801b-34c2368ea4b6"}, {"content": "Correct: In the early stages...", "correct": True, "id": "88893c55-3e30-4f50-8495-a6fe1f1cef94"}, {"content": "Incorrect: Customization becomes...", "correct": False, "id": "4afbf29f-d23d-43a3-8266-e41df3defa69"}]}}

User state documents for reflection and multiple choice

16

Document-Structured Data Challenges

About HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven Culture

ImpactsWhat’s Next?

• Documents with simple and consistent structure are easy to translate into relational form

• Documents with simple, but polymorphic structure are handled by modern MongoDB drivers (metric example)

• Documents with complicated and polymorphic structure (user state example) push the boundaries of current drivers and declarative tools

• Current solution: copy like-typed documents into separate collections

• Preferred solution: copy all documents into warehouse and do post-copy transforms for summary and detailed information in relational form

17

Creating a Data-Driven Culture

18

Creating a Data Driven CultureAbout HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

PeopleTechnicalPartners

LeadershipStaff Technology

Self-serviceEliminate

ComplexityExperimentation

ProcessProcess

GovernanceData Governance

Education

19

Enablers for Building Data Driven Culture at HBX

About HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

Strong Partners• Use off-shore partner

Mindtree to accelerate• Active engagement of

vendors on technology challenges

Education• Short Presentations to

staff• Data Analysis Exercise

at all-staff team meeting

Program Governance• Active interest &

involvement from Business Areas

• Alignment to organizational priorities

Experimentation• HBX willingness to try

new things• Helps drive engagement

with vendors

20

Organizational ImpactsAbout HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

Enablement of real-time data-driven decision making

• Dashboards for Registration Pipeline and Demographics

• Application Forecasting Dashboard

A move from spreadsheets to dashboards and configurable business processes

• Development of grading automation data pipeline

• Reporting for B2B Participants

A move from individually handled data requests to dashboards and self-service reporting

• Self-service marketing data extract

21

What’s Next?About HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

Streaming Data?

Native JSON Data Warehouse?

Analytics?

Additional Data Sources?

www.hbx.hbs.edu

Questions?

top related