Top Banner
Orchestrating the execution of workflows for media streaming service and even more Shuen-Huei (Drake) Guan sr. principal engineer, KKBOX vice chairperson, PyCon APAC 2015
47

Orchestrating the execution of workflows for media streaming service and even more

Jul 31, 2015

Download

Technology

Shuen-Huei Guan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Orchestrating the execution of workflows for media streaming service and even more

Orchestrating the execution of workflows for media streaming service and even moreShuen-Huei (Drake) Guan

sr. principal engineer, KKBOXvice chairperson, PyCon APAC 2015

Page 2: Orchestrating the execution of workflows for media streaming service and even more

Who am I?

• administrator, Ptt BBS

• technical director / R&D manager, Digimax

• team player, KKBOX

• contributor, PyCon Taiwan

Page 3: Orchestrating the execution of workflows for media streaming service and even more

Rather a story than tech sharing.No any KKBOX trade secrets get revealed.

Page 4: Orchestrating the execution of workflows for media streaming service and even more

There're just some slides talking about Python.

And, it's not about music streaming.

Page 5: Orchestrating the execution of workflows for media streaming service and even more
Page 6: Orchestrating the execution of workflows for media streaming service and even more

350 team players to serve10M users across 6 countries

Page 7: Orchestrating the execution of workflows for media streaming service and even more

20M songs

Page 8: Orchestrating the execution of workflows for media streaming service and even more

Events

Page 9: Orchestrating the execution of workflows for media streaming service and even more

KKTIXthe always lovely sponsor!

Page 10: Orchestrating the execution of workflows for media streaming service and even more

If we can make music streaming work, how

about video streaming?— KKBOX CxO

Page 11: Orchestrating the execution of workflows for media streaming service and even more

Let's work on a video-on-demand service

• Adaptive streaming.

• DRM protection.

• Video processing on cloud.

Page 12: Orchestrating the execution of workflows for media streaming service and even more

We thought video streaming is similar to music streaming,

but we were wrong.

Page 13: Orchestrating the execution of workflows for media streaming service and even more

Issue 1. Workflow

multiple distinct interconnected steps that need to be executed in a particular order in a distributed environment...— someoneflickr:siddhu2020

flickr:siddhu2020 http://bit.ly/1FAukT2

Page 14: Orchestrating the execution of workflows for media streaming service and even more

Sample encoding workflow for music

Page 15: Orchestrating the execution of workflows for media streaming service and even more

def run(source, secret_key, cipher): # verify if the source is ok. if not verify(source): return False

# convert audio with different bitrates _ = [convert(source, i) for i in range(4)]

# update id3 tag for all converted audios _ = update_id3_tag(_)

# encrypt all audios _ = encrypt(_, secret_key, cipher)

# deploy to backend DB deploy(_)

return True

Page 16: Orchestrating the execution of workflows for media streaming service and even more

Issue 2. Distribute tasks to the cloud, and use the cloud

efficiently!

Page 17: Orchestrating the execution of workflows for media streaming service and even more

Gearman

Page 18: Orchestrating the execution of workflows for media streaming service and even more

Sample encoding workflow for music

Page 19: Orchestrating the execution of workflows for media streaming service and even more

Sample client code to submit a workflow1

$workflow = new Gearman_Workflow('KKBOX_Convert_Audio' 'source' => $source, 'args' => $args);

$workflow->attachCallback(function () {});

$client->run($workflow);

1 warning, it's PHP.

Page 20: Orchestrating the execution of workflows for media streaming service and even more

Sample worker (server) code to do things1

class KKBOX_Convert_Audio extends Gearman_Worker { public function run($arg) { // check the source if (!verify()) return; // convert audio with different bitrates for ($i=0; $i<4; $i++) { convert($i); } // update id3 tag for all audios update_id3_tag(); // encrypt audios encrypt(); // sequentially deploy to backend DB for ($i=0; $i<4; $i++) { deploy($i); }}

1 warning, it's PHP.

Page 21: Orchestrating the execution of workflows for media streaming service and even more

Sample encoding workflow for video, a little bit complicated

Page 22: Orchestrating the execution of workflows for media streaming service and even more

Sample worker (server) code to do things1

class KKBOX_Encode_Video extends Gearman_Worker { public function run($arg) { transcode(); encrypt(); }}

class KKBOX_Convert_Video extends Gearman_Worker { public function run($arg) { if (!verify()) return;

// create asynchronous sub-workflows $result = create_sub_workflow(KKBOX_Encode_Video); // wait for all sub-workflows to finish joint($result);

create_sub_workflow(KKBOX_Package_DASH, $result->encrypted); create_sub_workflow(KKBOX_Package_HLS, $result->plain); joint();

deploy();}

1 warning, it's PHP.

Page 23: Orchestrating the execution of workflows for media streaming service and even more

The real gearman worker code is way more complicated w/o elegance we like to have

Page 24: Orchestrating the execution of workflows for media streaming service and even more

Issue 3. Workflows would evolve...

• Let's save file size and IO.

• Let's make it faster.

• Let's add some more profiles.

• Let's fix some encoding.

Page 25: Orchestrating the execution of workflows for media streaming service and even more
Page 26: Orchestrating the execution of workflows for media streaming service and even more

Everything fails all the time.— Werner Vogels, CTO of Amazonflickr:Bill Abbott

flickr:Bill Abbott http://bit.ly/1GnrSGr

Page 27: Orchestrating the execution of workflows for media streaming service and even more

Issue 4. Gearman server down!

Page 28: Orchestrating the execution of workflows for media streaming service and even more

Factors we like to pay much attention in

• Encoding workflow

• Tasks distributing across machines on cloud.

• Server maintenance.

Page 29: Orchestrating the execution of workflows for media streaming service and even more

We hope ...

1. no need to maintain this system;

2. easier to distribute workflow/tasks, even to local machine;

3. with high-level workflow.As long as you can draw your processes on a paper, you can map it to a workflow!

Page 30: Orchestrating the execution of workflows for media streaming service and even more

What Google suggests us...

• Apache Kafka, Mesos, ...

• Gearman (sorry, but we've tried.)

• Luigi by Spotify

• Celery

• Potentially all message brokers with some additional work.

Page 31: Orchestrating the execution of workflows for media streaming service and even more

AWS Simple Workflow (SWF)

Page 32: Orchestrating the execution of workflows for media streaming service and even more

class HelloWorker(swf.ActivityWorker):

domain = DOMAIN version = VERSION task_list = TASKLIST

def run(self): activity_task = self.poll() if 'activityId' in activity_task: print 'Hello, World!' self.complete() return True

Page 33: Orchestrating the execution of workflows for media streaming service and even more

class HelloDecider(swf.Decider):

domain = DOMAIN task_list = TASKLIST version = VERSION

def run(self): history = self.poll() if 'events' in history: # Find workflow events not related to decision scheduling. workflow_events = [e for e in history['events'] if not e['eventType'].startswith('Decision')] last_event = workflow_events[-1]

decisions = swf.Layer1Decisions() if last_event['eventType'] == 'WorkflowExecutionStarted': decisions.schedule_activity_task(...) elif last_event['eventType'] == 'ActivityTaskCompleted': decisions.complete_workflow_execution() self.complete(decisions=decisions) return True

Page 34: Orchestrating the execution of workflows for media streaming service and even more

SWF

• Decider defines the workflow.

• We still need to write workflow logic in decider.

• Workers do the action.

• Everytime, we changed workflow or action, we need to re-deploy deciders and workers.

Page 35: Orchestrating the execution of workflows for media streaming service and even more

Let's de-couple the workflow and action out of SWF

Page 36: Orchestrating the execution of workflows for media streaming service and even more
Page 37: Orchestrating the execution of workflows for media streaming service and even more

Job script for a workflow

Job {KKBOX Convert Video} -subtasks { Task {Source Inspection} -cmds { Cmd { emilia verify -i s3://bucket/source.mp4 } }

Task {Transcode} --parallel -subtasks { Iterate i -from 0 -to 4 -by 1 -template { Task {Transcode Audio} -cmds { Cmd { ffmpeg -i s3://bucket/source.mp4 -o /tmp/converted_$i.mp4 } } } Iterate i -from 0 -to 8 -by 1 -template { Task {Transcode Video} -cmds { Cmd { ffmpeg -i s3://bucket/source.mp4 -o /tmp/converted_$i.mp4 } } } }

Task {Adaptive} -subtasks { Task {DASH} -subtasks { } Task {HLS} -subtasks { } Task {MSS} -subtasks { } }}

Page 38: Orchestrating the execution of workflows for media streaming service and even more

What is exactly a job script?

Page 39: Orchestrating the execution of workflows for media streaming service and even more

����

Page 40: Orchestrating the execution of workflows for media streaming service and even more
Page 41: Orchestrating the execution of workflows for media streaming service and even more

Make it pythonic if that makes developers happier

source = 's3://bucket/source.mp4'

with Job(): with Task('Source Inspection'): Cmd('emilia verify -i %s' % source)

with Task('Transcode', parallel=True): for i in range(4): with Task(): Cmd('ffmpeg -i %s ... -o /tmp/a_%d.mp4' % (source, i)) for i in range(9): with Task(): Cmd('ffmpeg -i %s ... -o /tmp/v_%d.mp4' % (source, i))

with Task('Adaptive'): with Task('DASH'): pass with Task('HLS'): pass with Task('MSS'): pass

Page 42: Orchestrating the execution of workflows for media streaming service and even more

Status

• 1,500,000-minute videos got encoded.

• 3,000 videos per day (max).

• 800 workers on 100 c3.8xlarge instances (max).

• spent lots of $.

• everyone is really happy for that performance.

Page 43: Orchestrating the execution of workflows for media streaming service and even more

Technical status

• Fault tolerance by retry. [decider]

• Workflow/task has priorities. [SWF]

• try..except..finally mechanism. [-whendone, -whenerror, -precmds, -postcmds, ...]

Page 44: Orchestrating the execution of workflows for media streaming service and even more

Question:Are you interested in this project?

Page 45: Orchestrating the execution of workflows for media streaming service and even more

To do:

• Use JSON or YAML for job script.

• A viewer to see the progress of workflows!

• Replace SWF by Apache Mesos or Mistral.

Page 46: Orchestrating the execution of workflows for media streaming service and even more

Thank You!@drakeguan