Top Banner
Microsoft Garage: Modernizing Data Processing at the Museum of Science Nicholas Bradford | Tim Petri | Himanshu Sahay A Major Qualifying Project submitted to Worcester Polytechnic Institute. Presented 14 December 2016.
20

Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

May 15, 2018

Download

Documents

lynhan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

Microsoft Garage:Modernizing Data Processingat the Museum of Science

Nicholas Bradford | Tim Petri | Himanshu Sahay

A Major Qualifying Project submitted to Worcester Polytechnic Institute.

Presented 14 December 2016.

Page 2: Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

Hall of Human Life

● Opened in late 2013● Fifteen interactive kiosks (link stations)

in 5 categories● Wristband with unique barcode enables

a cross-kiosk experience● Additional exploration from the web

browser at home

(1)

Page 3: Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

Existing System

Page 4: Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

Objectives● Make the complete data set available in Azure● Provide insights into visitor usage patterns and exhibit health● Introduce the idea of anomalous data and monitoring for hardware malfunction

(2,3,4)

Page 5: Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

Moving Data to the Cloud

● Set up a SQL database in Azure, similar to the on-premise solution○ Allows to scale performance on the fly (adding resources)○ Created with future integration in mind ○ Ready-made integrations with tools such as Power BI, and Azure Machine learning

● Moved full historical data set into Azure○ 600,000+ visitors and almost 10,000,000 visitor answers

● Created custom views to support dashboard and machine learning models

(2)

Page 6: Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

Rule-Based Outlier Detection● Found several incorrect data points● Adopted a rule-based approach to flag

incorrect (“outlier”) data● Tested kiosks in person to force outliers

and generate acceptable bounds for each question*

● Recorded in database● Ran all data through rules to retroactively

flag as inlier or outlier

* questions accepting numeric answers

Page 7: Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

Dashboards● Set of visualizations and demographic filters

○ Age○ Gender○ Time of visit○ Date of visit

● Live connection between Azure SQL database and Power BI, near real time● Data processing

○ Relationships between views○ Conditional columns

● 2 dashboards: exhibit overview and detail view● Completed 2 rounds of reviews with primary users

Page 8: Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

Hardware Failure Detection: Motivation

Rule-based approach in action. Rules fail if relationships or distribution change.

Automatically flag potential hardware failures even when data falls within the outlier bounds.

Page 9: Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

Anomaly Model: Multivariate Gaussian

Contamination = 0%(trains on 100% of inlier data)

Contamination = 5%(trains on best 95% of inlier data)

Detect more subtle “anomalies” by fitting a normal distribution and considering covariance.

Page 10: Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

Historical Model: Univariate Gaussian

Typical distribution. A reasonable cutoff appears.

Set a threshold for acceptable anomaly rate for each kiosk (2 standard deviations above mean).

100% anomalies: probably bad.

Page 11: Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

Training data (past year)

Test data (past day)

Extraction(per kiosk) Anomaly Model

(find anomalies)Historical Model

(judge anomaly rate)

Hardware Failure Detection: Azure ML

Log results(in DB & email)

↑ contam. = ↑ strict ↑ threshold = ↓ alerts

Page 12: Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

Putting it All Together: Architecture

Future Work● Integration with existing Hall

of Human Life system● Testing hardware failure

detection system

Page 13: Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

Dashboard Demo

Page 14: Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

Thank you!

Page 15: Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

References(1) Musuem of Science: Image from Hall of Human Life http://exhibits.mos.org/(2) Cloud database icon:

https://www.caspio.com/wp-content/uploads/2015/05/caspio-features-illustr_cloud-data_3_2x.png(3) Dashboard Icon: http://www.freeiconspng.com/uploads/dashboard-icon-19.png(4) Kernel Machine icon:

http://upload.wikimedia.org/wikipedia/commons/thumb/1/1b/Kernel_Machine.png/440px-Kernel_Machine.png

Page 16: Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

Hall of Human Life Overview

Page 17: Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

Hall of Human Life Overview - Filtered

Page 18: Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

Detail View

Page 19: Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

Detail View - Filtered

Page 20: Microsoft Garage: Modernizing Data Processing at the …web.cs.wpi.edu/~claypool/mqp/msoft/mos-16/slides.pdf ·  · 2017-03-01Objectives Make the complete data set available in Azure

Sharing Reports