Top Banner
Bruno Wu Data Scientist @Move Prompt: We have two kinds of interfaces (1) technical (building models, turning them into APIs, embedding them into products and (2) business (translating business problems into data problems.) How can we repeatedly take outputs from models and translate them into value for the business through interventions, experiments, new product features? What tools do we need to create to do this again and again ? (How do we not let get things “Lost in Translation”?)
20

Data and Business Team Collaboration

Jan 22, 2018

Download

Business

Apple
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data and Business Team Collaboration

Bruno Wu

Data Scientist @Move

Prompt:

We have two kinds of interfaces (1) technical (building models, turning them into APIs, embedding them into products and (2) business (translating business problems into data problems.) How can we repeatedly take outputs from models and translate them into value for the business through interventions, experiments, new product features? What tools do we need to create to do this again and again? (How do we not let get things “Lost in Translation”?)

Page 2: Data and Business Team Collaboration
Page 3: Data and Business Team Collaboration

I have to confess

Split / / Personality

Page 4: Data and Business Team Collaboration

Goals/values of business and technical interfaces

Page 5: Data and Business Team Collaboration

Towards a Common Framework

Problem

Data

Model v1

Testing

Release v1

Data/Feedback

Model v2

Testing

Release v2… IMPACT

Goal:Increase Velocity

of the Vortex

Page 6: Data and Business Team Collaboration

Identify Stage-Transition Tasks (STTs)

Problem

Data

Model v1

Testing

Release v1

Data/Feedback

Model v2

Testing

Release v2… IMPACT

- Problem definitions / scoping

- Find / acquire data and labels

- Feature engineering /selection

- Algorithm training / selection

- Data engineering

- Testing / Optimization

Problem

Data

Model v1

Testing

Release v1

Data/Feedback

Model v2

Testing

Release v2… IMPACT

Page 7: Data and Business Team Collaboration

Dissecting Stage-Transition Tasks (STTs)

1. Problem definition/scoping

2. Find/acquire data and labels

3. Feature engineering/selection

4. Algorithm testing/selection

5. Testing and OptimizationSpeed

Automate

Standardize

Collaborate

Page 8: Data and Business Team Collaboration

Define Problem and Scope (Problem -> Data)

- Scoping document and data product roadmap wiki

Page 9: Data and Business Team Collaboration

Well-Defined Problems (Problem -> Data)

- Well-defined problems. Example: “Lookyloos”

- Ratio = Revenue Lost Per Lead Submitter (from shrinking lead form) ÷ Revenue Gained Per Lead Submitter (from ad impressions)

Page 10: Data and Business Team Collaboration

Not So Well-Defined Problems (Problem -> Data)

- Unfortunately, many business problems that are of value are also difficult to define:

- Who are Potential Sellers or Millennial on Realtor.com?

- What constitute similar neighborhoods?

- Increase collaboration with domain experts and expand options for data collection.

Page 11: Data and Business Team Collaboration

Acquiring Labels (Problem -> Data)

Collecting Labels is often the most critical yet difficult.

- Implicit v. explicit labels – user’s actions v surveys or registration

- Tools:

1. Guidelines and budget to create more API/products for automating labels generation (e.g. contents, widgets, games)

2. “Human-in-the-Loop” services. e.g. CrowdFlower, Amazon Mechanical Turks (e.g. evaluate relevancy for neighborhood, recommendations and image tagging)

Collecting Labels

Page 12: Data and Business Team Collaboration

Acquiring Features (Problem -> Data)

Data Enrichment products or services

- Develop data acquisition strategy / guidelines to allow data scientists subscribing for increasing feature space

- Tools:

1. Census data seems to be adding predictive power. Increase feature space. (e.g. PolicyMap)

2. Cross-device tracking services to link users across platforms to increase feature space.

Data Enrichment

Page 13: Data and Business Team Collaboration

Feature Engineering / Selection (Data -> Model)

- Increasing standardization and rigor of the feature engineering and selection process will help to speed things up:

- Tools: Google BigQuery/Python/R

Categorization and cataloguing

of features

Create derivative features

Systematic tests to measure

feature importance

Systematic procedures for

feature selection

Page 14: Data and Business Team Collaboration

Feature Engineering / Selection (Data -> Model)

- Need the most amount of time and input from domain experts.

- Collaboration is crucial for this task, otherwise data scientists are making educated guesses.

- Extensive collaboration on the feature engineering side from business team is still missing at the moment.

Categorization and cataloguing

of features

Create derivative features

Systematic tests to measure

feature importance

Systematic procedures for

feature selection

Page 15: Data and Business Team Collaboration

Algorithm Training / Selection (Data-> Model)

- Lost in Translation (Part 1)

- On the one hand: Models are well understood by data scientists but black-box to other stakeholders.

Page 16: Data and Business Team Collaboration

Algorithm Training / Selection (Data-> Model)

- Lost in Translation (Part 2)

- On the other hand: Need more open-mindedness from stakeholders

- A lot of times, effective models are not simple heuristics based on a strong signal but a mix of weak signals.

- “Life is messy” and “wisdom of the crowd” analogies.

- Embrace and be comfortable adapting nuances, not view simplification as paramount.

Page 17: Data and Business Team Collaboration

Testing and Optimization (Model -> Testing)

- Embrace quantity: AirBnB has ~100 tests running at any point in time. How?

- If tools and guidelines are sufficiently in place, we should aim to remove barriers for testing as much as possible.

Page 18: Data and Business Team Collaboration

Testing and Optimization (Model -> Testing)- This is beginning to happen @Move. - Possible guidelines and tools for improving

collaboration, standardization, automation for testing:- Not looking under the hood- Tracking tools for internal APIs- Require users to clarify hypothesis / create query to

measure the right metrics- Add experiment process to onboarding- Make experiment documentation more discoverable on

the wiki

Page 19: Data and Business Team Collaboration

Testing and Optimization (Model -> Testing)

- Currently, we utilize both third-party tools and proprietary API for testing.

- Tools: Optimizely, proprietary REST API

- Same tools / languages between data science, testing, and production helps to speed up experimentation and production: implement optimized versions only when needed. Reduce chances for “lost in translation” between experimentation, testing, and production.

Page 20: Data and Business Team Collaboration

Three Things To Remember

1. Embrace nuances – eliminates biases (e.g. confirmation bias, selection bias)

2. Set up systems/guidelines in order to remove barriers for frequent testing

3. Collaborate at critical points where domain experts can add the most value