Statistical Programming with JavaScript
Post on 13-Apr-2017
391 Views
Preview:
Transcript
STAT I ST I CA L P R O G R A M M I N GI N J AVA S C R I P T
D a v i d S i m o n s@ S wa m Wi t h Tu r t l e s
demos: swamwithturtles.github.io/js-statistics
code: github.com/SwamWithTurtles/js-statistics
WHO AM I?
Freelance Software Developer
@SwamWithTurtles
Java and JavaScript
Afraid of goats?
WHO AM I?
DATA NERD
C O N T E N T S
THEORY CASE STUDIES JAVASCRIPT APPLICATION
WHAT IS DATA?
GAINING INSIGHTS RANDOMNESS SIMULATION
L E A R N I N G T H R O U G H
Reward: What shape is the internet?
Data
B E H I N D T H E H O O D
API
DB
ADMIN INTERFACE
SCHEDULED TASKS
3RD PARTY APIS
W H AT D ATA W A S T H E R E ?
S O …
W H AT D ATA W A S T H E R E ?
• Counts of lists (e.g. brands, products etc.)
• Stock levels and prices of products
• Days an item has been out of stock
W H AT D ATA W A S T H E R E ?
• Non-functional data
• Numbers of users
• Performance for users
• Performance of third party APIs
• Robustness of system (Uptime, status codes, frequency of errors)
T H E R E I S D ATA E V E R Y W H E R E
T H E L E S S O N ?
What is data?
What is good data?
W H AT D ATA S H O U L D I C A R E A B O U T ?
• Data you get repeatedly
• Data you can extract ‘information’ from
• Normally this means numerical data, though NLP is getting big!
• Data that answers valuable questions
Gaining Insights
A data set:
Identification WIND CEILING TEMP DEWPT RHX USAF NCDC Date HrMn I Type QCP Dir Q I Spd Q Hgt Q I I Temp Q Dewpt Q RHx 865300,99999,19860401,0000,4,FM-12, ,110,1,N, 7.2,1,22000,1,C,N, 21.6,1, 19.2,1, 86, 865300,99999,19860401,0300,4,FM-12, ,110,1,N, 5.1,1,22000,1,C,N, 19.4,1, 18.5,1, 95, 865300,99999,19860401,0600,4,FM-12, ,070,1,N, 7.2,1,03600,1,C,N, 19.2,1, 999.9,9,999, 865300,99999,19860401,0900,4,FM-12, ,070,1,N, 6.2,1,00120,1,C,N, 19.2,1, 18.9,1, 98, 865300,99999,19860401,1200,4,FM-12, ,070,1,N, 7.7,1,03600,1,C,N, 21.6,1, 18.3,1, 82, 865300,99999,19860401,1500,4,FM-12, ,040,1,N, 9.8,1,03600,1,C,N, 23.0,1, 18.8,1, 77, 865300,99999,19860401,1800,4,FM-12, ,030,1,N, 6.2,1,03600,1,C,N, 19.6,1, 19.0,1, 96, 865300,99999,19860401,2100,4,FM-12, ,050,1,N, 6.7,1,03600,1,C,N, 19.0,1, 18.7,1, 98, 865300,99999,19860402,0000,4,FM-12, ,340,1,N, 7.2,1,03600,1,C,N, 20.0,1, 19.4,1, 96, 865300,99999,19860402,0300,4,FM-12, ,360,1,N, 4.1,1,03600,1,C,N, 19.4,1, 19.1,1, 98, 865300,99999,19860402,0600,4,FM-12, ,999,1,C, 0.0,1,03600,1,C,N, 19.2,1, 18.9,1, 98, 865300,99999,19860402,0900,4,FM-12, ,999,1,C, 0.0,1,00210,1,C,N, 19.0,1, 18.7,1, 98, 865300,99999,19860402,1200,4,FM-12, ,200,1,N, 2.6,1,00210,1,C,N, 20.4,1, 20.1,1, 98, 865300,99999,19860402,1500,4,FM-12, ,210,1,N, 5.1,1,00750,1,C,N, 23.2,1, 19.3,1, 79, 865300,99999,19860402,1800,4,FM-12, ,200,1,N, 3.1,1,00750,1,C,N, 26.4,1, 18.4,1, 62, 865300,99999,19860402,2100,4,FM-12, ,999,1,C, 0.0,1,22000,1,C,N, 26.2,1, 17.1,1, 57, 865300,99999,19860403,0000,4,FM-12, ,140,1,N, 4.1,1,22000,1,C,N, 19.2,1, 17.0,1, 87, 865300,99999,19860403,0300,4,FM-12, ,999,1,C, 0.0,1,22000,1,C,N, 15.8,1, 15.2,1, 96, 865300,99999,19860403,0600,4,FM-12, ,999,1,C, 0.0,1,22000,1,C,N, 15.4,1, 14.0,1, 91, 865300,99999,19860403,1200,4,FM-12, ,060,1,N, 5.1,1,22000,1,C,N, 21.0,1, 19.8,1, 93, 865300,99999,19860403,1500,4,FM-12, ,060,1,N, 4.1,1,00900,1,C,N, 24.8,1, 21.3,1, 81, 865300,99999,19860403,1800,4,FM-12, ,050,1,N, 7.7,1,09000,1,C,N, 28.0,1, 21.4,1, 67, 865300,99999,19860403,2100,4,FM-12, ,040,1,N, 5.1,1,09000,1,C,N, 25.4,1, 21.4,1, 79, 865300,99999,19860404,0000,4,FM-12, ,060,1,N, 6.2,1,03600,1,C,N, 22.2,1, 21.3,1, 95, 865300,99999,19860404,0300,4,FM-12, ,050,1,N, 5.1,1,09000,1,C,N, 21.0,1, 20.7,1, 98, 865300,99999,19860404,0600,4,FM-12, ,060,1,N, 6.2,1,22000,1,C,N, 20.2,1, 19.9,1, 98, 865300,99999,19860404,1200,4,FM-12, ,040,1,N, 5.1,1,00120,1,C,N, 20.4,1, 19.5,1, 95, 865300,99999,19860404,1500,4,FM-12, ,020,1,N, 7.7,1,00420,1,C,N, 24.2,1, 20.4,1, 79, 865300,99999,19860404,1800,4,FM-12, ,250,1,N, 4.1,1,00750,1,C,N, 25.6,1, 20.7,1, 74, 865300,99999,19860404,2100,4,FM-12, ,250,1,N, 5.1,1,00750,1,C,N, 23.6,1, 20.4,1, 82, 865300,99999,19860405,0000,4,FM-12, ,180,1,N, 6.2,1,00420,1,C,N, 20.2,1, 19.6,1, 96, 865300,99999,19860405,0300,4,FM-12, ,160,1,N, 5.1,1,00120,1,C,N, 18.6,1, 18.0,1, 96,
s u m m a r y s t a t i s t i c s
S U M M A R Y S TAT I S T I C S
• A statistic is a function of the data we have inputed
• It aims to capture information about values to make it more understandable
T H E FA M O U S O N E :
• Mean (‘average’)
• Sum all of the data and divide by the number of items
• Gives a sense of ‘size’
Group 1:
Group 2:
O T H E R S TAT I S T I C S
• “Location”
• Mean, Mode, Median
• “Spread”
• Standard Deviation
• “Shape”
• Skew, Kurtosis
D E M O
Distributions
What is a random variable?
Discrete VariablesCan be any of a list of values, each with its own probability
H E A D S 0 . 5
TA I L S 0 . 5
2 1 / 3 6
3 2 / 3 6
4 3 / 3 6
5 4 / 3 6
6 5 / 3 6
7 6 / 3 6
8 5 / 3 6
9 4 / 3 6
1 0 3 / 3 6
1 1 2 / 3 6
1 2 1 / 3 6
This makes sense:X = Result of a coin flip
H E A D S 0 . 5
TA I L S 0 . 5 But:X won’t always have the
same value
R A N D O M VA R I A B L E S
X = Result of a coin flip
H E A D S 0 . 5
TA I L S 0 . 5
X is a Random Variable
This is its distribution
D E M O …
ContinuousA numerical variable,
that can be any number (sometimes within a range)
height
weightMath.random()
H O W D O W E D E F I N E T H E D I S T R I B U T I O N ?
Math.random() height
D E M O
S O W H AT ?E R R R …
• When we do data analysis, we’re really looking at the range of values a random variable can be…
• … and asking questions about its distribution.
Y O U ’ R E A N A U D I T O R
I M A G I N E …
A U D I T I N G A L E D G E R
• Make a list of all ingoing and outgoing transactions
• These are random variables.
• What is their distribution? Does it deviate from what we expect?
B E N F O R D ’ S L A W
http://www.journalofaccountancy.com/Issues/1999/May/nigrini
I N T U I T I V E U S E R I N P U T S
D E S I G N I N G
O U R TA S K …
• Designing a system that tries to understand what happens under financial system “shocks”
• So: a user would input a shock, its impacts would propagate and we would see our bottom line.
O U R F I R S T AT T E M P T
• Shock ‘sliders’ that scaled linearly
0 %
2 5 % B O O M
9 0 % B U S T
D I S T R I B U T I O N O F F I N A N C I A L C H A N G E S
S O …
• Shock ‘sliders’ that scaled linearly
0 %
8 % B O O M
1 0 5 % B U S T
Change that happens with 75% chance
Change that happens with 10% chance
Randomness
M A K I N G R A N D O M VA R I A B L E S
S O M E W A R N I N G S
• Exactly what randomness means is a fuzzy question.
• These numbers are not ‘cryptographically’ random.
J AVA S C R I P T ’ S E N T R Y T O R A N D O M N E S S
• Different runtimes can implement it differently.
• V8 implements Multiply-With-Carry:
• Take a sequence of ‘seed’ values
• Iteratively perform modular arithmetic-based operations
• Extend the initial seed values to a longer sequence.
Math.random()
W H AT A B O U T O T H E R D I S T R I B U T I O N S ?
B U T …
T H E S H O R T A N S W E R
Math.random()= f( )
T H E S H O R T A N S W E R
=H E A D S 0 . 5
TA I L S 0 . 5
=
WHAT ’ S THE FUNCTION?
jStatbeta
centralF cauchy
chi-squared exponential
gamma inverse gamma kumaraswamy
lognormal normal pareto
student t uniform weibull
binomial negative binomial hypergeometric
poisson triangular
OR
U S I N G R A N D O M N E S S
w h y w o u l d i w a n t t o u s e
R A N D O M NE S S ?
S T U B B E D T E S T D ATA
• Avoid coupling yourself to specific test implementations
• Spin-up life-like environments for load testing
N O N -D E T E R M I N I S T I C A L G O R I T H M S
• Modelling underlying or random data
• Solving a problem that is expensive or impossible to solve perfectly
P I T FA L L S
C H O O S I N G T H E D I S T R I B U T I O N
• What if a ‘uniform’ distribution isn’t enough?
• What if we want random data that isn’t just numbers?
E X A M P L E : S O C I A L N E T W O R K
E X A M P L E : S O C I A L N E T W O R K
11 Traversals
D E M O
B a r a b a s i - A l b e r t R a n d o m M o d e l
B A R A B A S I - A L B E R T R A N D O M M O D E L
• Start with two linked objects
• Add one new object at a time
• Link that object to one existing object, with already ‘popular’ objects more likely to be chosen.
T H I S M O D E L S …
• Academic Citations
• Actor filmographies
• Spread of Infectious diseases
• Social Networks
C O N T E N T S
THEORY CASE STUDIES JAVASCRIPT APPLICATION
WHAT IS DATA?
GAINING INSIGHTS RANDOMNESS SIMULATION
L E A R N I N G T H R O U G H
Reward: What shape is the internet?
We’reOUTof
TIME
• Data is any information we collect. Not all data is valuable.
• Seeing trends in lots of numbers is hard. Summary statistics and charts help us unpick its meaning.
• Data can be treated as random ‘realisations’ from a backing distribution.
• Making random variables is easy, and can be done in different shapes for different purposes.
WHAT IS DATA?
GAINING INSIGHTS RANDOMNESS SIMULATION
L I B R A R I E S W E U S E D
G E N E R A L L I B R A R I E SK N O C K O U T. J S
R E Q U I R E . J S B O O T S T R A P
D ATA M A N I P U L AT I O N L O D A S H J S TAT
D ATA I M P O RT PA PA PA R S E
C H A RT I N G D 3 C H A R T. J S
T H A N K Y O U
D a v i d S i m o n s@ S wa m Wi t h Tu r t l e s
top related