Basic WhizzML Workflows The BigML Team May 2016 The BigML Team Basic WhizzML Workflows May 2016 1 / 24
Basic WhizzML Workflows
The BigML Team
May 2016
The BigML Team Basic WhizzML Workflows May 2016 1 / 24
Outline
1 What is WhizzML?
2 WhizzML Server-side Resources
3 WhizzML Language Basics
4 Standard Library Overview
5 Tutorial Walkthrough: Model or Ensemble?
The BigML Team Basic WhizzML Workflows May 2016 2 / 24
Outline
1 What is WhizzML?
2 WhizzML Server-side Resources
3 WhizzML Language Basics
4 Standard Library Overview
5 Tutorial Walkthrough: Model or Ensemble?
The BigML Team Basic WhizzML Workflows May 2016 3 / 24
WhizzML in a Nutshell
• Domain-specific language for ML workflow automationI High-level problem and solution specification
• Framework for scalable, remote execution of ML workflowsI Sophisticated server-side optimizationI Out-of-the-box scalabilityI Client-server brittleness removedI Infrastructure for creating and sharing ML scripts and libraries
The BigML Team Basic WhizzML Workflows May 2016 4 / 24
Outline
1 What is WhizzML?
2 WhizzML Server-side Resources
3 WhizzML Language Basics
4 Standard Library Overview
5 Tutorial Walkthrough: Model or Ensemble?
The BigML Team Basic WhizzML Workflows May 2016 5 / 24
WhizzML REST Resources
Library Reusable building-block: a collection ofWhizzML definitions that can be imported byother libraries or scripts.
Script Executable code that describes an actualworkflow.
• Imports List of libraries with code used bythe script.
• Inputs List of input values thatparameterize the workflow.
• Outputs List of values computed by thescript and returned to the user.
Execution Given a script and a complete set of inputs,the workflow can be executed and its outputsgenerated.
The BigML Team Basic WhizzML Workflows May 2016 6 / 24
Outline
1 What is WhizzML?
2 WhizzML Server-side Resources
3 WhizzML Language Basics
4 Standard Library Overview
5 Tutorial Walkthrough: Model or Ensemble?
The BigML Team Basic WhizzML Workflows May 2016 7 / 24
Basic Syntax
Atomic constants"a string value"
23, -10, -1.23E11, 1.42342
true, false
Fully parenthesized prefix notation(list-sources) ;; Function call without arguments
(log-info "Hello World!")
(* 2 (+ 2 3)) ;; Evaluates to 2 * (2 + 3)
(atan (tan 3)) ;; Nested function calls
The BigML Team Basic WhizzML Workflows May 2016 8 / 24
Variables
Namesdataset_id
date-of-birth
sources*
positive?
x, y
Definition(define name "Arthur Samuel")
(define birth-year 1901)
(define age (- 2016 birth-year))
The BigML Team Basic WhizzML Workflows May 2016 9 / 24
Composite Values: Lists
Literals[1.2 2.3 3.4]
["red" "blue" "orange" "yellow"]
[[1 2] "this" 3]
[] ;; the empty list
Constructors and accessors(list 1 (+ 1 1) (* 3 2)) ;; => [1 2 6]
(append [1 2 3] 4) ;; => [1 2 3 4]
(head [1 2 3]) ;; => 1
(tail [1 2 3]) ;; => [2 3]
(nth ["a" "b" [1 2]] 1) ;; => "b"
The BigML Team Basic WhizzML Workflows May 2016 10 / 24
Composite Values: Maps
Literals{"name" "John"
"married" true
"date-of-birth" 1901}
{"source" "source/122323445445565665"
"input_fields" ["000000" "000001" "000003"]
"sample" {"rate" 0.3}}
Constructors and accessors(assoc {"a" 3} "b" 4 "c" 5) ;; => {"a" 3 "b" 4 "c" 5}
(dissoc {"a" 3 "b" "c"} "b") ;; => {"a" 3}
(get {"a" 1 "b" 2} "a") ;; => 1
(get {"a" 1 "b" 2} "non-existent-key") ;; => false
(get {"a" 1 "b" 2} "non-existent-key" 42) ;; => 42
(get-in {"a" {"b" 2 "c" {"d" 42}}} ["a" "c" "d"]) ;; => 42
The BigML Team Basic WhizzML Workflows May 2016 11 / 24
Functions
Defining a function(define (function-name arg1 arg2 ...)body)
Examples(define (add-numbers x y)
(+ x y))
(define (create-model-and-ensemble dataset-id)
(create-model {"dataset" dataset-id})
(create-ensemble {"dataset" dataset-id
"number_of_models" 10}))
The BigML Team Basic WhizzML Workflows May 2016 12 / 24
Local variables
Let bindings(let (name-1 val-1
name-2 val-2...)
body)
Example:(define no-of-models 10)
(let (msg "I am creating "
id "dataset/570861ecb85eee0472000016")
;; here msg, id and no-of-models are bound
(log-info msg no-of-models)
(create-ensemble {"dataset" id
"number_of_models" no-of-models}))
;;; here msg and id are *not* bound
The BigML Team Basic WhizzML Workflows May 2016 13 / 24
Conditionals
if(if (> x 0) ;; condition
"x is positive" ;; consequent
"x is not positive") ;; alternative
when(when (positive? n)(log-info "Creating a few models...")(create-lots-of-models n))
The BigML Team Basic WhizzML Workflows May 2016 14 / 24
Conditionals
cond;; Nested conditionals
(if (> x 3)
"big"
(if (< x 1)
"small"
"standard"))
;; are better with cond:
(cond (> x 3) "big"
(< x 1) "small"
"standard")
The BigML Team Basic WhizzML Workflows May 2016 15 / 24
Error handling
Signaling errors(raise {"message" "Division by zero" "code" -10})
Catching errors(try (/ 42 x)
(catch e(log-warn "I've got an error with message: "
(get e "message")" and code "(get e "code"))))
The BigML Team Basic WhizzML Workflows May 2016 16 / 24
Demo: a simple script
Create dataset and return its row number(define (make-dataset id name)(let (ds-id (create-and-wait-dataset {"source" id
"name" name}))(fetch ds-id)))
(define dataset (make-dataset source-id source-name))(define dataset-id (get dataset "resource"))(define rows (get dataset "rows"))
https://gist.github.com/whizzmler/917a05cf6c173381116e3cc02da70e42
The BigML Team Basic WhizzML Workflows May 2016 17 / 24
Outline
1 What is WhizzML?
2 WhizzML Server-side Resources
3 WhizzML Language Basics
4 Standard Library Overview
5 Tutorial Walkthrough: Model or Ensemble?
The BigML Team Basic WhizzML Workflows May 2016 18 / 24
Standard functions
• Numeric and relational operators (+, *, <, =, ...)
• Mathematical functions (cos, sinh, floor ...)
• Strings and regular expressions (str, matches?, replace, ...)
• Flatline generation
• Collections: list traversal, sorting, map manipulation
• BigML resources manipulationCreation create-source, create-and-wait-dataset, etc.
Retrieval fetch, list-anomalies, etc.
Update update
Deletion delete
• Machine Learning Algorithms (SMACDown, Boosting, etc.)
The BigML Team Basic WhizzML Workflows May 2016 19 / 24
Outline
1 What is WhizzML?
2 WhizzML Server-side Resources
3 WhizzML Language Basics
4 Standard Library Overview
5 Tutorial Walkthrough: Model or Ensemble?
The BigML Team Basic WhizzML Workflows May 2016 20 / 24
Model or Ensemble?
• Split a dataset in test and training parts
• Create a model and an ensemble with the training dataset
• Evaluate both with the test dataset
• Choose the one with better evaluation (f-measure)
https://github.com/whizzml/examples/tree/master/model-or-ensemble
The BigML Team Basic WhizzML Workflows May 2016 21 / 24
Model or Ensemble?;; Functions for creating the two dataset parts
;; and the model and ensemble from the training set.
(define (sample-dataset ds-id rate oob)
(create-and-wait-dataset {"sample_rate" rate
"origin_dataset" ds-id
"out_of_bag" oob
"seed" "whizzml-example"}))
(define (split-dataset ds-id rate)
(list (sample-dataset ds-id rate false)
(sample-dataset ds-id rate true)))
(define (make-model ds-id)
(create-and-wait-model {"dataset" ds-id}))
(define (make-ensemble ds-id size)
(create-and-wait-ensemble {"dataset" ds-id
"number_of_models" size}))
The BigML Team Basic WhizzML Workflows May 2016 22 / 24
Model or Ensemble?
;; Functions for evaluating model and ensemble
;; using the test set, and to extract f-measure from
;; the evaluation results
(define (evaluate-model model-id ds-id)
(create-and-wait-evaluation {"model" model-id
"dataset" ds-id}))
(define (evaluate-ensemble model-id ds-id)
(create-and-wait-evaluation {"ensemble" model-id
"dataset" ds-id}))
(define (f-measure ev-id)
(get-in (fetch ev-id) ["result" "model" "average_f_measure"]))
The BigML Team Basic WhizzML Workflows May 2016 23 / 24
Model or Ensemble?;; Function encapsulating the full workflow
(define (model-or-ensemble src-id)
(let (ds-id (create-and-wait-dataset {"source" src-id})
;; ^ full dataset
ids (split-dataset ds-id 0.8) ;; split it 80/20
train-id (nth ids 0) ;; the 80% for training
test-id (nth ids 1) ;; and 20% for evaluations
m-id (make-model train-id) ;; create a model
e-id (make-ensemble train-id 15) ;; and an ensemble
m-f (f-measure (evaluate-model m-id test-id)) ;; evaluate
e-f (f-measure (evaluate-ensemble e-id test-id)))
(log-info "model f " m-f " / ensemble f " e-f)
(if (> m-f e-f) m-id e-id)))
;; Compute the result of the script execution
;; - Inputs: [{"name": "input-source-id", "type": "source-id"}]
;; - Outputs: [{"name": "result", "type": "resource-id"}]
(define result (model-or-ensemble input-source-id))
The BigML Team Basic WhizzML Workflows May 2016 24 / 24