Top Banner
Basic WhizzML Workflows The BigML Team May 2016 The BigML Team Basic WhizzML Workflows May 2016 1 / 24
24

Basic WhizzML Workflows

Apr 07, 2017

Download

Data & Analytics

BigML, Inc
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Basic WhizzML Workflows

Basic WhizzML Workflows

The BigML Team

May 2016

The BigML Team Basic WhizzML Workflows May 2016 1 / 24

Page 2: Basic WhizzML Workflows

Outline

1 What is WhizzML?

2 WhizzML Server-side Resources

3 WhizzML Language Basics

4 Standard Library Overview

5 Tutorial Walkthrough: Model or Ensemble?

The BigML Team Basic WhizzML Workflows May 2016 2 / 24

Page 3: Basic WhizzML Workflows

Outline

1 What is WhizzML?

2 WhizzML Server-side Resources

3 WhizzML Language Basics

4 Standard Library Overview

5 Tutorial Walkthrough: Model or Ensemble?

The BigML Team Basic WhizzML Workflows May 2016 3 / 24

Page 4: Basic WhizzML Workflows

WhizzML in a Nutshell

• Domain-specific language for ML workflow automationI High-level problem and solution specification

• Framework for scalable, remote execution of ML workflowsI Sophisticated server-side optimizationI Out-of-the-box scalabilityI Client-server brittleness removedI Infrastructure for creating and sharing ML scripts and libraries

The BigML Team Basic WhizzML Workflows May 2016 4 / 24

Page 5: Basic WhizzML Workflows

Outline

1 What is WhizzML?

2 WhizzML Server-side Resources

3 WhizzML Language Basics

4 Standard Library Overview

5 Tutorial Walkthrough: Model or Ensemble?

The BigML Team Basic WhizzML Workflows May 2016 5 / 24

Page 6: Basic WhizzML Workflows

WhizzML REST Resources

Library Reusable building-block: a collection ofWhizzML definitions that can be imported byother libraries or scripts.

Script Executable code that describes an actualworkflow.

• Imports List of libraries with code used bythe script.

• Inputs List of input values thatparameterize the workflow.

• Outputs List of values computed by thescript and returned to the user.

Execution Given a script and a complete set of inputs,the workflow can be executed and its outputsgenerated.

The BigML Team Basic WhizzML Workflows May 2016 6 / 24

Page 7: Basic WhizzML Workflows

Outline

1 What is WhizzML?

2 WhizzML Server-side Resources

3 WhizzML Language Basics

4 Standard Library Overview

5 Tutorial Walkthrough: Model or Ensemble?

The BigML Team Basic WhizzML Workflows May 2016 7 / 24

Page 8: Basic WhizzML Workflows

Basic Syntax

Atomic constants"a string value"

23, -10, -1.23E11, 1.42342

true, false

Fully parenthesized prefix notation(list-sources) ;; Function call without arguments

(log-info "Hello World!")

(* 2 (+ 2 3)) ;; Evaluates to 2 * (2 + 3)

(atan (tan 3)) ;; Nested function calls

The BigML Team Basic WhizzML Workflows May 2016 8 / 24

Page 9: Basic WhizzML Workflows

Variables

Namesdataset_id

date-of-birth

sources*

positive?

x, y

Definition(define name "Arthur Samuel")

(define birth-year 1901)

(define age (- 2016 birth-year))

The BigML Team Basic WhizzML Workflows May 2016 9 / 24

Page 10: Basic WhizzML Workflows

Composite Values: Lists

Literals[1.2 2.3 3.4]

["red" "blue" "orange" "yellow"]

[[1 2] "this" 3]

[] ;; the empty list

Constructors and accessors(list 1 (+ 1 1) (* 3 2)) ;; => [1 2 6]

(append [1 2 3] 4) ;; => [1 2 3 4]

(head [1 2 3]) ;; => 1

(tail [1 2 3]) ;; => [2 3]

(nth ["a" "b" [1 2]] 1) ;; => "b"

The BigML Team Basic WhizzML Workflows May 2016 10 / 24

Page 11: Basic WhizzML Workflows

Composite Values: Maps

Literals{"name" "John"

"married" true

"date-of-birth" 1901}

{"source" "source/122323445445565665"

"input_fields" ["000000" "000001" "000003"]

"sample" {"rate" 0.3}}

Constructors and accessors(assoc {"a" 3} "b" 4 "c" 5) ;; => {"a" 3 "b" 4 "c" 5}

(dissoc {"a" 3 "b" "c"} "b") ;; => {"a" 3}

(get {"a" 1 "b" 2} "a") ;; => 1

(get {"a" 1 "b" 2} "non-existent-key") ;; => false

(get {"a" 1 "b" 2} "non-existent-key" 42) ;; => 42

(get-in {"a" {"b" 2 "c" {"d" 42}}} ["a" "c" "d"]) ;; => 42

The BigML Team Basic WhizzML Workflows May 2016 11 / 24

Page 12: Basic WhizzML Workflows

Functions

Defining a function(define (function-name arg1 arg2 ...)body)

Examples(define (add-numbers x y)

(+ x y))

(define (create-model-and-ensemble dataset-id)

(create-model {"dataset" dataset-id})

(create-ensemble {"dataset" dataset-id

"number_of_models" 10}))

The BigML Team Basic WhizzML Workflows May 2016 12 / 24

Page 13: Basic WhizzML Workflows

Local variables

Let bindings(let (name-1 val-1

name-2 val-2...)

body)

Example:(define no-of-models 10)

(let (msg "I am creating "

id "dataset/570861ecb85eee0472000016")

;; here msg, id and no-of-models are bound

(log-info msg no-of-models)

(create-ensemble {"dataset" id

"number_of_models" no-of-models}))

;;; here msg and id are *not* bound

The BigML Team Basic WhizzML Workflows May 2016 13 / 24

Page 14: Basic WhizzML Workflows

Conditionals

if(if (> x 0) ;; condition

"x is positive" ;; consequent

"x is not positive") ;; alternative

when(when (positive? n)(log-info "Creating a few models...")(create-lots-of-models n))

The BigML Team Basic WhizzML Workflows May 2016 14 / 24

Page 15: Basic WhizzML Workflows

Conditionals

cond;; Nested conditionals

(if (> x 3)

"big"

(if (< x 1)

"small"

"standard"))

;; are better with cond:

(cond (> x 3) "big"

(< x 1) "small"

"standard")

The BigML Team Basic WhizzML Workflows May 2016 15 / 24

Page 16: Basic WhizzML Workflows

Error handling

Signaling errors(raise {"message" "Division by zero" "code" -10})

Catching errors(try (/ 42 x)

(catch e(log-warn "I've got an error with message: "

(get e "message")" and code "(get e "code"))))

The BigML Team Basic WhizzML Workflows May 2016 16 / 24

Page 17: Basic WhizzML Workflows

Demo: a simple script

Create dataset and return its row number(define (make-dataset id name)(let (ds-id (create-and-wait-dataset {"source" id

"name" name}))(fetch ds-id)))

(define dataset (make-dataset source-id source-name))(define dataset-id (get dataset "resource"))(define rows (get dataset "rows"))

https://gist.github.com/whizzmler/917a05cf6c173381116e3cc02da70e42

The BigML Team Basic WhizzML Workflows May 2016 17 / 24

Page 18: Basic WhizzML Workflows

Outline

1 What is WhizzML?

2 WhizzML Server-side Resources

3 WhizzML Language Basics

4 Standard Library Overview

5 Tutorial Walkthrough: Model or Ensemble?

The BigML Team Basic WhizzML Workflows May 2016 18 / 24

Page 19: Basic WhizzML Workflows

Standard functions

• Numeric and relational operators (+, *, <, =, ...)

• Mathematical functions (cos, sinh, floor ...)

• Strings and regular expressions (str, matches?, replace, ...)

• Flatline generation

• Collections: list traversal, sorting, map manipulation

• BigML resources manipulationCreation create-source, create-and-wait-dataset, etc.

Retrieval fetch, list-anomalies, etc.

Update update

Deletion delete

• Machine Learning Algorithms (SMACDown, Boosting, etc.)

The BigML Team Basic WhizzML Workflows May 2016 19 / 24

Page 20: Basic WhizzML Workflows

Outline

1 What is WhizzML?

2 WhizzML Server-side Resources

3 WhizzML Language Basics

4 Standard Library Overview

5 Tutorial Walkthrough: Model or Ensemble?

The BigML Team Basic WhizzML Workflows May 2016 20 / 24

Page 21: Basic WhizzML Workflows

Model or Ensemble?

• Split a dataset in test and training parts

• Create a model and an ensemble with the training dataset

• Evaluate both with the test dataset

• Choose the one with better evaluation (f-measure)

https://github.com/whizzml/examples/tree/master/model-or-ensemble

The BigML Team Basic WhizzML Workflows May 2016 21 / 24

Page 22: Basic WhizzML Workflows

Model or Ensemble?;; Functions for creating the two dataset parts

;; and the model and ensemble from the training set.

(define (sample-dataset ds-id rate oob)

(create-and-wait-dataset {"sample_rate" rate

"origin_dataset" ds-id

"out_of_bag" oob

"seed" "whizzml-example"}))

(define (split-dataset ds-id rate)

(list (sample-dataset ds-id rate false)

(sample-dataset ds-id rate true)))

(define (make-model ds-id)

(create-and-wait-model {"dataset" ds-id}))

(define (make-ensemble ds-id size)

(create-and-wait-ensemble {"dataset" ds-id

"number_of_models" size}))

The BigML Team Basic WhizzML Workflows May 2016 22 / 24

Page 23: Basic WhizzML Workflows

Model or Ensemble?

;; Functions for evaluating model and ensemble

;; using the test set, and to extract f-measure from

;; the evaluation results

(define (evaluate-model model-id ds-id)

(create-and-wait-evaluation {"model" model-id

"dataset" ds-id}))

(define (evaluate-ensemble model-id ds-id)

(create-and-wait-evaluation {"ensemble" model-id

"dataset" ds-id}))

(define (f-measure ev-id)

(get-in (fetch ev-id) ["result" "model" "average_f_measure"]))

The BigML Team Basic WhizzML Workflows May 2016 23 / 24

Page 24: Basic WhizzML Workflows

Model or Ensemble?;; Function encapsulating the full workflow

(define (model-or-ensemble src-id)

(let (ds-id (create-and-wait-dataset {"source" src-id})

;; ^ full dataset

ids (split-dataset ds-id 0.8) ;; split it 80/20

train-id (nth ids 0) ;; the 80% for training

test-id (nth ids 1) ;; and 20% for evaluations

m-id (make-model train-id) ;; create a model

e-id (make-ensemble train-id 15) ;; and an ensemble

m-f (f-measure (evaluate-model m-id test-id)) ;; evaluate

e-f (f-measure (evaluate-ensemble e-id test-id)))

(log-info "model f " m-f " / ensemble f " e-f)

(if (> m-f e-f) m-id e-id)))

;; Compute the result of the script execution

;; - Inputs: [{"name": "input-source-id", "type": "source-id"}]

;; - Outputs: [{"name": "result", "type": "resource-id"}]

(define result (model-or-ensemble input-source-id))

The BigML Team Basic WhizzML Workflows May 2016 24 / 24