Top Banner
STYLIGHT.COM Helping Data Teams with Puppet STYLIGHT.COM SERGII KHOMENKO, DATA SCIENTIST, [email protected], @lc0d3r
36

Puppet Camp London 2015 - Helping Data Teams with Puppet

Jul 26, 2015

Download

Software

Puppet Labs
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Puppet Camp London 2015 - Helping Data Teams with Puppet

S T Y L I G H T . C O M

Helping Data Teams wi th Puppet

S T Y L I G H T . C O M

S E R G I I K H O M E N K O , D A T A S C I E N T I S T , S E R G I I . K H O M E N K O @ S T Y L I G H T . C O M , @ l c 0 d 3 r

Page 2: Puppet Camp London 2015 - Helping Data Teams with Puppet

W h o ? W h a t ? W h y ? S e t t i n g u p y o u r B I w i t h p u p p e t .

S m a l l t i p s a n d t r i c k s P u p p e t y o u r r a n k i n g

A G E N D A

Page 3: Puppet Camp London 2015 - Helping Data Teams with Puppet

Data scientist at one of the biggest fashion communities, STYLIGHT. Data analysis and visualization hobbyist. Speaker at Berlin Buzzwords 2014, ApacheCon Europe 2014 Founder and speaker at Munich Golang UG, Munich Tableau UG. Speaker at Munich UseR Group, Munich Search UG, Munich Quantified Self UG.

Sergii Khomenko

Milos Radovanovic

Passionate about DevOps stuff: 1. microservices 2. docker 3. 12 factor apps 4. continuous integration/deployment

Page 4: Puppet Camp London 2015 - Helping Data Teams with Puppet
Page 5: Puppet Camp London 2015 - Helping Data Teams with Puppet
Page 6: Puppet Camp London 2015 - Helping Data Teams with Puppet

L i v e i n 1 2 c o u n t r i e s STYLIGHT – international community

Page 7: Puppet Camp London 2015 - Helping Data Teams with Puppet

S T Y L I G H T . C O M

Setting up your BI with puppet.

Page 8: Puppet Camp London 2015 - Helping Data Teams with Puppet

T a b l e a u - r e p o r t i n g a n d a d - h o c s P y t h o n / T a l e n d E T L t o o l s

Minimum Viable BI

Page 9: Puppet Camp London 2015 - Helping Data Teams with Puppet

R U N N I N G P U P P E T I N A S T A N D A L O N E M O D E

Minimum Viable BI

We use Puppet for *nix servers and can’t merge with Windows machine Standalone mode for Puppet

– easier to start and develop – windows machines are separated from *nix ones

Page 10: Puppet Camp London 2015 - Helping Data Teams with Puppet

R U N N I N G P U P P E T I N A S T A N D A L O N E M O D E

Minimum Viable BI

cd c:\folder\with\our-bi git pull origin master IF %ERRORLEVEL% NEQ 0 set context=GIT_FAILURE && goto error_handler puppet apply --modulepath=puppet\modules puppet\win-node-name.net.pp IF %ERRORLEVEL% NEQ 0 set context=PUPPET_FAILURE && goto error_handler goto end

Page 11: Puppet Camp London 2015 - Helping Data Teams with Puppet

R U N N I N G P U P P E T I N A S T A N D A L O N E M O D E

Minimum Viable BI

:error_handler echo entering error_handler EVENTCREATE /T ERROR /L APPLICATION /SO Puppet_Scheduler /ID 100 /D "EXECUTION FAILED REASON %context%" goto end :end echo DONE

Page 12: Puppet Camp London 2015 - Helping Data Teams with Puppet

Minimum Viable BI

Standalone mode for Puppet – configuration is totally separated – custom modules --modulepath=puppet\modules – Github hosted configuration – Error handling via Windows event log

R U N N I N G P U P P E T I N A S T A N D A L O N E M O D E

Page 13: Puppet Camp London 2015 - Helping Data Teams with Puppet

Minimum Viable BI

node  'ʹwin-­‐‑node-­‐‑name.net'ʹ  {        scheduled_task  {'ʹrefresh-­‐‑1'ʹ:            ensure        =>  present,            enabled      =>  true,            command      =>  'ʹC:\path\to\your\script.bat'ʹ,            arguments  =>  'ʹsome  args  'ʹ,            

S C H E D U L I N G I S I M P O R T A N T

Page 14: Puppet Camp London 2015 - Helping Data Teams with Puppet

Minimum Viable BI

           user  =>  'ʹyour-­‐‑user'ʹ,            password  =>  'ʹyour-­‐‑password'ʹ,            trigger      =>  {                schedule      =>  daily,                start_time  =>  'ʹ06:00'ʹ,            }        }

S C H E D U L I N G I S I M P O R T A N T

Page 15: Puppet Camp London 2015 - Helping Data Teams with Puppet

Minimum Viable BI

# Can't use the Puppet's scheduled_task as it does not support to run the schedule task every 5 minutes. https://github.com/sdliangzhihua/windows-puppet-example/blob/master/manifest.pp#L68

S Y N C M Y C O N F I G U R A T I O N E V E R Y 1 5 M I N

Page 16: Puppet Camp London 2015 - Helping Data Teams with Puppet

Minimum Viable BI

$cmd = 'C:\Windows\system32\cmd.exe' $job_name = 'sync_code' exec { 'CreateCodeSyncScheduledTask': command => "${cmd} /C schtasks /create /sc MINUTE /mo 15 /tn ${job_name} /tr C:\\your\\puppet.bat /ru administrator /f", onlyif => ["${cmd} /C schtasks /query /tn ${job_name} & if errorlevel 1 (exit /b 0) else exit /b 1"], }

S Y N C M Y C O N F I G U R A T I O N E V E R Y 1 5 M I N

Page 17: Puppet Camp London 2015 - Helping Data Teams with Puppet

S T Y L I G H T . C O M

Small tips and tricks do  not  repeat  yourself  and  other  tricks

Page 18: Puppet Camp London 2015 - Helping Data Teams with Puppet

Minimum Viable BI

node  'ʹwin-­‐‑node-­‐‑name.net'ʹ  {        scheduled_task  {'ʹrefresh-­‐‑1'ʹ:            ensure        =>  present,            enabled      =>  true,            command      =>  'ʹC:\path\to\your\script.bat'ʹ,            arguments  =>  'ʹsome  args  'ʹ,            

S C H E D U L I N G I S I M P O R T A N T

Page 19: Puppet Camp London 2015 - Helping Data Teams with Puppet

Small tips and tricks

class  job_scheduler(        $ensure                        =  $job_scheduler::params::ensure,        $enabled                    =  $job_scheduler::params::enabled,        $user                                =  $job_scheduler::params::user,        $password              =  $job_scheduler::params::password,        $working_dir    =  $job_scheduler::params::working_dir, )inherits  job_scheduler::params{ }

Page 20: Puppet Camp London 2015 - Helping Data Teams with Puppet

Small tips and tricks

define  job_scheduler::job (        $arguments              ='ʹtableau_adobe.py'ʹ,        $command                  ='ʹc:\Py27-­‐‑32\python.exe'ʹ,        $schedule_type      ='ʹdaily'ʹ,        $start_time            ='ʹ08:15'ʹ,        $day_of_week          ='ʹevery'ʹ, ) {

Page 21: Puppet Camp London 2015 - Helping Data Teams with Puppet

Small tips and tricks

define  job_scheduler::tableau_job (        $arguments              ='ʹdefault-­‐‑tableau'ʹ,        $command                  ='ʹc:\folder\tableau.bat'ʹ,        $schedule_type      ='ʹdaily'ʹ,        $start_time            ='ʹ21:00'ʹ,        $day_of_week          ='ʹevery'ʹ, ) {

Page 22: Puppet Camp London 2015 - Helping Data Teams with Puppet

Small tips and tricks

# Params with default values for the tableau job # that might be changed in a job definition # # 1. $arguments ='default-argument', # 2. $command ='c:\folder\script.bat', # 3. $schedule_type ='daily', # 4. $start_time ='21:00', # 5. $day_of_week ='every', ####################

Page 23: Puppet Camp London 2015 - Helping Data Teams with Puppet

Small tips and tricks

job_scheduler::tableau_job { ’some job': start_time => '01:00', arguments => ’args'; ’default refresh-1': start_time => '06:00'; 'default refresh-2': start_time => '10:00'; 'weekly update': start_time => '03:35', arguments => 'weekly-update', schedule_type => weekly, day_of_week => ['mon']; }

Page 24: Puppet Camp London 2015 - Helping Data Teams with Puppet

Small tips and tricks

job_scheduler::redshift_job  {            'ʹRS  tagged  products'ʹ:                  start_time  =>  'ʹ00:40'ʹ,  params  =>  'ʹ..\datasources\something.tds'ʹ;            'ʹRS  another  job'ʹ:  start_time  =>  'ʹ00:50'ʹ,  params  =>  'ʹ..\datasources\else.tds'ʹ

Page 25: Puppet Camp London 2015 - Helping Data Teams with Puppet

S T Y L I G H T . C O M

Puppet your ranking Lean,  flexible,  powerful

Page 26: Puppet Camp London 2015 - Helping Data Teams with Puppet

A r a n k i n g i s a r e l a t i o n s h i p b e t w e e n a s e t o f i t e m s s u c h t h a t ,

f o r a n y t w o i t e m s , t h e f i r s t i s e i t h e r ' r a n k e d h i g h e r t h a n ' ,

' r a n k e d l o w e r t h a n ' o r ' r a n k e d e q u a l t o ' t h e s e c o n d .

Page 27: Puppet Camp London 2015 - Helping Data Teams with Puppet

Ranking specifics:

•  Seasonal influence •  Trends •  Cold start of new countries, shops •  Multiple dimensions of ranking model

Page 28: Puppet Camp London 2015 - Helping Data Teams with Puppet

Requirements: •  Decreasing time to implement new ranking

model •  Keeping working infrastructure alive •  A/B testing without changing entire

infrastructure •  Performance level - “still fast” and

“transparent”

Lean approach to Ranking M u l t i p l e p o i n t s o f e v a l u a t i o n

Page 29: Puppet Camp London 2015 - Helping Data Teams with Puppet

Jboss Solr-loadbalancer nginx Solr

nginx Solr

nginx Solr

Common search infrastructure

Page 30: Puppet Camp London 2015 - Helping Data Teams with Puppet

Updated infrastructure

Jboss Solr-loadbalancer nginx Solr

nginx Solr

nginx Solr

Jboss Solr-loadbalancer nginx Solr

Front-end loadbalancer

Page 31: Puppet Camp London 2015 - Helping Data Teams with Puppet

q = +brand:adidas shop:monshowroom^3 q = +adidas monshowroom defType = dismax qf = brand shop^3 sort = user_ratings desc, score desc qq = adidas q = {!boost b=$b defType=dismax v=$qq} b = prod(popularity, clicks)

Lean approach to Ranking

Page 32: Puppet Camp London 2015 - Helping Data Teams with Puppet

Lean approach to Ranking solr0x.node.company.pp

include nginx nginx::config { "solr_dev": } nginx::solr-ranking { "delta2": ur ls => [ “ /some.thing?

gender=women&brand=2271&tag=1161&tag=877&tag=468", " /some.thing?

gender=men&brand=11235&tag=10203&tag=10299&tag=10326" ] ,

Page 33: Puppet Camp London 2015 - Helping Data Teams with Puppet

Lean approach to Ranking

<% urls.each do |url| -%> if ($args ~* <% if url[ 'gender'] > 0 -%>gender_id%3A<

%= url[ 'gender'] %>.*<% end -%><% url[ ' tags'].each do |tag| -%>tag_id%3A<%= tag %>.*<% end -%><% if url[ 'brand'] > 0 -%>brand_id%3A%28<%= url[ 'brand'] %>%29<% end -%>) {

set $orig $args; set $args "q={!boost+b=%24b+defType=dismax+v=

%24qq}&qq=id:*"; rewrite ^(.*)$ "$1?$orig" break; } <% end -%>

nginx / templates / conf / solr-rewrites.conf.erb

Page 34: Puppet Camp London 2015 - Helping Data Teams with Puppet

Stages to evaluate a model: •  R ranking model •  Independent Solr-node

1.  For internal use-cases 2.  Testing for some of pages 3.  A/B roll out for % of users

•  Production roll out

Lean approach to Ranking M u l t i p l e p o i n t s o f e v a l u a t i o n

Page 35: Puppet Camp London 2015 - Helping Data Teams with Puppet

Thanks for your attention!

Page 36: Puppet Camp London 2015 - Helping Data Teams with Puppet

S T Y L I G H T . C O M

Sergii Khomenko Data Scientist

STYLIGHT GmbH [email protected]

@lc0d3r

Nymphenburger Straße 86 80636 Munich, Germany