Top Banner
2 JUNE 2016 BUILDING A CLOUD BASED DATA WAREHOUSE GILDAS BAH, BRENT BENSON, & RYAN FRAZIER
22

It summit data mgmt-2016.06.02-final

Apr 12, 2017

Download

Education

kevin_donovan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: It summit data mgmt-2016.06.02-final

2 JUNE 2016

BUILDING A CLOUD BASED DATA WAREHOUSE

GILDAS BAH, BRENT BENSON, & RYAN FRAZIER

Page 2: It summit data mgmt-2016.06.02-final

2

PresentersAbout HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven Culture

ImpactsWhat’s Next?

Ryan Frazier – Director, Systems Engineering and Operations

Brent Benson – Enterprise Architect

Gildas Bah – Data Analyst Engineer

Page 3: It summit data mgmt-2016.06.02-final

3

Page 4: It summit data mgmt-2016.06.02-final

4

Harvard Business School’s newest division, tasked with reimagining business education for the digital age

Launched in June 2014 Located in Allston, five minutes

from HBS campus Moving from start-up to enterprise

mode The teaching model sets HBX apart from many online learning options and is reflective of the HBS in-person classroom approach

What is HBX?About HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven Culture

ImpactsWhat’s Next?

Page 5: It summit data mgmt-2016.06.02-final

5

HBX PlatformsAbout HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

HBX Online Platform HBX Live

Mainly asynchronous online business education

Engagement through student interaction in cohorts of ~400

Case-based learning with highly interactive teaching elements and peer help

WGBH studio-based virtual classroom

Synchronous audio/video with chat, polls, boards

Up to 60 global students on studio wall, hundreds or more observers

Page 6: It summit data mgmt-2016.06.02-final

66

Building a Data Management Practice

Page 7: It summit data mgmt-2016.06.02-final

7

Why Build a Data-Driven Culture?About HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

Enhance Outcomes• Proactively

support struggling students

• Identify challenging content

• Evaluate and improve interactive content, social engagement, and retention

Improve Effectiveness

• Scale data intensive activities like marketing, admissions, & grading

• Use data to test ideas and improve quality of decisions

Refine Pedagogy• Evaluate new

pedagogical approaches

• Optimize evaluation approaches

• Support pedagogical research activities and innovation

STUDENTS STAFF FACULTY

Foster Innovation & Continuous Improvement• Identify and evaluate innovation opportunities• Drive continuous improvement

Page 8: It summit data mgmt-2016.06.02-final

8

Data Management Program Objectives

About HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

Integrate Data Sources into Comprehensive Data

Warehouse

Build Reports and Dashboards

Enable Self Service Ensure Data Quality and Integrity

Page 9: It summit data mgmt-2016.06.02-final

9

Tool and Vendor SelectionAbout HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

Data Warehouse• Standard relational DB, Redshift• Chose Redshift because of scalability, performance• Aligns with AWS platform focus

ETL• Informatica, Talend• Chose Informatica because of university

relationship and myriad of plugable connectors

Reporting/Analytics• Microstrategy, Qlik, Tableau• Chose Tableau because of feature set and industry

adoption

Page 10: It summit data mgmt-2016.06.02-final

10

Reporting Copy

HBX Data EcosystemAbout HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

Course

PlatformVer. A

MongoDB MySQL Reporting

Copy

Course Platform

Ver. B

MongoDB MySQL

Historical Data

MongoDB MySQL

Admin System

MySQL

Salesforce

Redshift Informatica

Secure Agent

NEW!

Tableau Server

Progress ODBC for MongoDB

sync

Page 11: It summit data mgmt-2016.06.02-final

11

HBX Data Management by the Numbers

About HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

Source Systems• 35 databases• 887 tables• 5,844 fields• 109,751,902 rows

Data Warehouse• 4 Redshift clusters• 8 databases• 404 tables• 5,674 fields• 400,794,679 rows

Daily ETL Process• 300 jobs• 6,515,599 rows

* Updated 6/1/2016

Page 12: It summit data mgmt-2016.06.02-final

12

HBX Data ModelsAbout HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

{ "_id": ObjectId("556f25ab662a9b059ea8df8b"), "tei_id": "554a607b241b5a3f0e09eefe", "course_instance_id":"556dcf55b7431f414d87f06f", "user_id" : "8701", "comments" : [ { "id" : "ce59a25a-ce69-47-c534611f7ebf", "text" : "This is a great response…, "author_id" : "6411", "date_created" : “2015-09-10”,

MySQL-Relational MongoDB-Semi-Structured

Course offeringsStudent demographics

Applications & registration

Limited course content

Course structureCourse content

Student course stateMetric (timing) data

Page 13: It summit data mgmt-2016.06.02-final

13

ChallengesAbout HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

?

X

X

X Immature data connection support

Large object storage limitations in Redshift

Difficulty flattening complex/polymorphic data structures

Page 14: It summit data mgmt-2016.06.02-final

14

{"_id': ObjectId("2804c514e4c20e6d"), "course_instance_id": "2804c51563c9c772", "tei_id": "241b5a14b75fac83", "user_id": "3312”, "date_created": datetime.datetime(2015, 10, 14, 10, 56, 59, 137000), "category": "timespan", "metric": {"interaction_time": 180, "is_interaction_time": True}}

{"_id": ObjectId("2804c514f92f1f64"), "course_instance_id": "2804c51563c9c772", "tei_id": "241b5a14b75fab70", "user_id": "3312”, "date_created": datetime.datetime(2015, 10, 14, 17, 11, 56, 967000), "category": "view_user_response", "metric": {"viewed_user_id": "9212"}}

Document-Structured Data Challenges

About HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven Culture

ImpactsWhat’s Next?

Page 15: It summit data mgmt-2016.06.02-final

15

Document-Structured Data Challenges

About HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven Culture

ImpactsWhat’s Next?

{"_id": ObjectId("562666868c58dab88be84345"), "course_instance_id": "561829c02804c51563c9c772", "tei_id": "55c21b88241b5a14b75fab8d", "user_id": "3312", "state": {"answer": "The case really drove home..."}}

{"_id": ObjectId("56414b498c58dab88bf10873"), "course_instance_id": "561829c02804c51563c9c772", "tei_id": "5639ef402804c509af1d2721", "user_id": "3312", "state": {"summary": [{"content": "Incorrect: Being quick to market...", "correct": False, "id": "5bc32d05-1173-452c-801b-34c2368ea4b6"}, {"content": "Correct: In the early stages...", "correct": True, "id": "88893c55-3e30-4f50-8495-a6fe1f1cef94"}, {"content": "Incorrect: Customization becomes...", "correct": False, "id": "4afbf29f-d23d-43a3-8266-e41df3defa69"}]}}

User state documents for reflection and multiple choice

Page 16: It summit data mgmt-2016.06.02-final

16

Document-Structured Data Challenges

About HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven Culture

ImpactsWhat’s Next?

• Documents with simple and consistent structure are easy to translate into relational form

• Documents with simple, but polymorphic structure are handled by modern MongoDB drivers (metric example)

• Documents with complicated and polymorphic structure (user state example) push the boundaries of current drivers and declarative tools

• Current solution: copy like-typed documents into separate collections

• Preferred solution: copy all documents into warehouse and do post-copy transforms for summary and detailed information in relational form

Page 17: It summit data mgmt-2016.06.02-final

17

Creating a Data-Driven Culture

Page 18: It summit data mgmt-2016.06.02-final

18

Creating a Data Driven CultureAbout HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

PeopleTechnicalPartners

LeadershipStaff Technology

Self-serviceEliminate

ComplexityExperimentation

ProcessProcess

GovernanceData Governance

Education

Page 19: It summit data mgmt-2016.06.02-final

19

Enablers for Building Data Driven Culture at HBX

About HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

Strong Partners• Use off-shore partner

Mindtree to accelerate• Active engagement of

vendors on technology challenges

Education• Short Presentations to

staff• Data Analysis Exercise

at all-staff team meeting

Program Governance• Active interest &

involvement from Business Areas

• Alignment to organizational priorities

Experimentation• HBX willingness to try

new things• Helps drive engagement

with vendors

Page 20: It summit data mgmt-2016.06.02-final

20

Organizational ImpactsAbout HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

Enablement of real-time data-driven decision making

• Dashboards for Registration Pipeline and Demographics

• Application Forecasting Dashboard

A move from spreadsheets to dashboards and configurable business processes

• Development of grading automation data pipeline

• Reporting for B2B Participants

A move from individually handled data requests to dashboards and self-service reporting

• Self-service marketing data extract

Page 21: It summit data mgmt-2016.06.02-final

21

What’s Next?About HBXHBX Data

Management Initiative

Architecture & Implementation

ChallengesData Driven

CultureImpacts

What’s Next?

Streaming Data?

Native JSON Data Warehouse?

Analytics?

Additional Data Sources?

Page 22: It summit data mgmt-2016.06.02-final

www.hbx.hbs.edu

Questions?