Top Banner

of 52

High Volume Scheduling and Job Management With PostgreSQL

Jun 02, 2018

Download

Documents

TrurlScribd
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    1/52

    High-Volume

    Scheduling and Job

    Management with

    PostgreSQL

    Leonardo Meira, Software Engineer

    April 3rd, 2014

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    2/52

    !Has been in production for years, runnin

    hundreds and thousands of scripts simuand hundreds of millions of scripts in tot

    !PostgreSQL is the basis for our high-vol

    traditional queuing system!Log manager

    !Open Source

    System

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    3/52

    ! Context (What does Fiksu do?)

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    4/52

    ! Context (What does Fiksu do?)

    !

    The Problem

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    5/52

    ! Context (What does Fiksu do?)

    !

    The Problem

    ! The Solution

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    6/52

    ! Context (What does Fiksu do?)

    !

    The Problem

    ! The Solution

    ! Wrap-up

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    7/52

    ! Context (What does Fiksu do?)

    !

    The Problem

    ! The Solution

    ! Wrap-up

    ! Fiksus Open Source Projects

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    8/52

    ! Context (What does Fiksu do?)

    !

    The Problem

    ! The Solution

    ! Wrap-up

    ! Fiksus Open Source Projects

    ! Questions

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    9/52

    Fiksu is a mobile atechnology compa

    Fiksu makes it ea

    mobile app markeacquire the users

    need to grow their

    business.

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    10/52

    !

    !

    The Problem

    !

    !

    !

    !

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    11/52

    !Massive data retrieval needs

    !

    Jobs running periodically, guaranteed to

    cloud environment

    !

    Machine failures handled gracefully

    !Load balancing

    !

    Quickly diagnose troubles when failures

    Problem

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    12/52

    !

    !

    ! The Solution

    !

    !

    !

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    13/52

    Network Application Framework

    !Framework for building applicatio

    !Schedules, runs and monitors pro

    across a distributed network.

    !

    Like a distributed CRON but cloprepared

    !A front-end UI to control it all

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    14/52

    A cloud prepared distributed cron

    !Efficient queue management of applica

    !

    Machine node redundancy

    !

    Application run dependencies

    !Run restrictions preventing clashing sc

    from executing at the same time

    !machine or cluster wide clash check

    !

    Load Balancing

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    15/52

    A nice UI

    !Historical process runs

    !Scheduling reports

    !Debugging

    !Log/Machine/Queue managemen

    !Machine health

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    16/52

    Based on Af

    !Ruby on Rails scripting framework

    !

    Open sourced in 2013 (https://github.cpublic/af)

    !Provides command line parsing

    !

    Tight integration with PostgreSQL

    !Modified log4r to make log manageme

    !

    Strong application component module

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    17/52

    Tight integration with Postgresql

    !Automatic management ofpg_stat_activity.application_name

    !Provided by our gem pg_application_name

    !Helpers for advistory locking

    !

    Provided by our gem pg_advisory_locker!Bulk data management

    !More of a supplement to activerecords lac

    !Provided by our gem bulk data methods

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    18/52

    Network Application Framework

    ! Open sourced in 2013 (https://github.com/fiksu-p

    !

    Has been in production for years, hundreds of macross hundreds of machines

    ! Manages script run times and distribution (think m

    multi-machine/distributed cron)

    !

    Cloud Machine watchdog, alarming and logging

    !AWS EC2 + RDS for ease

    ! RDS is of the PostgreSQL variety

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    19/52

    Overview A running system

    NAFDB

    Runner

    Script5

    Script6

    Script7

    Script8

    Runner

    Script1

    Script2

    Script3

    Script4

    Script configuration

    Script Schedules

    Script Queues

    Logs

    Alarming

    Yo

    N

    Runner Machine 1 Runner Machine 2 Servic

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    20/52

    Runners

    ! One per machine

    ! Responsible for

    ! Schedule management

    ! Queuing of periodic jobs

    ! Job starting and process management

    !

    Queue/clash management

    ! Load balancing

    ! Machine watchdogging

    !

    Seamless code version upgrades

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    21/52

    Runners start up

    !Clean old processes

    !

    Remove invalid jobs

    !Wind down other runner

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    22/52

    Runners main loop

    ! If we have been marked down, kill all local jobs and exit

    ! If any local jobs have died, clean them up

    ! If we can start any new jobs, start them

    ! Mark ourselves last_seen_at so other runners know we are alive

    ! If no machine has checked schedules in the last minute

    !

    Mark ourselves as the leader

    ! Queue any scripts that need have asked to run in this time perio

    ! Mark any other runners down if they havent updated their last_s

    minutes

    ! Mark that we checked the schedules

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    23/52

    Table "naf.machines"Column | Type

    ---------------------------+-------------------------

    id | integer created_at | timestamp without time z

    updated_at | timestamp without time z

    server_address | inet

    server_name | text

    short_name | text

    server_note | text

    enabled | boolean

    thread_pool_size | integer

    last_checked_schedules_at | timestamp without time z

    last_seen_alive_at | timestamp without time z

    marked_down | boolean

    marked_down_by_machine_id | integer

    marked_down_at | timestamp without time z

    log_level | text

    deleted | boolean

    Schema machines

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    24/52

    Machines UI

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    25/52

    Table "naf.applications"

    Column | ---------------------+------------

    id | integer

    created_at | timestamp w

    updated_at | timestamp w

    deleted | boolean

    application_type_id | integer

    command | text

    title | text

    short_name | text

    log_level | text

    Schema applications/schedulesTable "naf.application_schedules"

    Column | Type |

    --------------------------------------+-----------------------------+

    id | integer |

    created_at | timestamp without time zone |

    updated_at | timestamp without time zone |enabled | boolean |

    visible | boolean |

    application_id | integer |

    application_run_group_restriction_id | integer |

    application_run_group_name | text |

    application_run_group_limit | integer |

    run_interval | integer |

    priority | integer |

    enqueue_backlogs | boolean

    run_interval_style_id | integer

    application_run_group_quantum | integer

    A li ti UI

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    26/52

    Applications UI

    Application Sched les UI

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    27/52

    Application Schedules UI

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    28/52

    Runners Queue management

    ! Historical jobs in partitioned tables

    !

    old tables dropped every month or so

    ! queued jobs in their own table

    ! well describe why a normal queue is not sat

    !

    running jobs in their own table! in memory hot can be replaced by a Dyn

    thing but postgresql does a great job for hun

    machines

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    29/52

    Table "naf.historical_jobs"Column | Type

    --------------------------------------+--------------------------

    id | bigint

    created_at | timestamp without time z

    updated_at | timestamp without time z

    application_id | integer

    application_type_id | integer

    command | text

    application_run_group_restriction_id | integer

    application_run_group_name | text

    application_run_group_limit | integer

    priority | integer

    started_on_machine_id | integer

    failed_to_start | boolean

    started_at | timestamp without time z

    pid | integer

    finished_at | timestamp without time z

    exit_status | integer

    termination_signal | integer

    state | text

    request_to_terminate | boolean

    marked_dead_by_machine_id | integer

    marked_dead_at | timestamp without time z

    log_level | text

    machine_runner_invocation_id | integer

    application_schedule_id | integer

    Schema Historical Jobs

    S h Q d J b

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    30/52

    Table "naf.queued_jobs"

    Column | Type

    --------------------------------------+------------------

    id | bigint

    created_at | timestamp without

    updated_at | timestamp without

    application_id | integer

    application_type_id | integer

    command | text

    application_run_group_restriction_id | integer application_run_group_name | text

    application_run_group_limit | integer

    priority | integer

    application_schedule_id | integer

    Schema Queued Jobs

    S h R i J b

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    31/52

    Table "naf.running_jobs"Column | Type

    --------------------------------------+------------------------

    id | bigint

    created_at | timestamp without time updated_at | timestamp without time

    application_id | integer

    application_type_id | integer

    command | text

    application_run_group_restriction_id | integer

    application_run_group_name | text

    application_run_group_limit | integer

    started_on_machine_id | integer

    started_at | timestamp without time

    pid | integer

    request_to_terminate | boolean

    marked_dead_by_machine_id | integer

    marked_dead_at | timestamp without time

    log_level | text

    tags | text[]

    application_schedule_id | integer

    Schema Running Jobs

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    32/52

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    33/52

    Affinities

    !Think of them as puzzle pieces

    !

    Machines have affinity slots

    !Slots can be required

    !Applications have affinity tabs

    !

    An application can run on any machine that hthat match all of its tabs

    !A machine will only run applications that hav

    all of its required slots

    Slot

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    34/52

    Affinity parameters

    !Machines have an allocate-able numb

    slots of a specific affinity

    !Applications will allocate a certain num

    slots at run start (or not run on that ma

    !Load balancing

    Affi i i Di ( l i )

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    35/52

    Affinities Diagram (puzzle pieces)

    1

    Machine1 App1 App1 App2 App3

    1 1

    Schema affinities

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    36/52

    Table "naf.affinities"Column | Type |

    ----------------------------+-----------------------------+

    id | integer |

    created_at | timestamp without time zone |

    updated_at | timestamp without time zone |selectable | boolean |

    affinity_classification_id | integer |

    affinity_name | text |

    affinity_short_name | text |

    affinity_note | text |

    Schema -- affinities

    Table "naf.machine_affinity_s

    Column |

    --------------------+---------

    id | integer

    created_at | timesta

    machine_id | integer

    affinity_id | integer

    affinity_parameter | numeric

    required | boolean

    Table "naf.application_schedule_affinity_tabs "

    Column | Type |-------------------------+-----------------------------+

    id | integer |

    created_at | timestamp without time zone |

    application_schedule_id | integer |

    affinity_id | integer |

    affinity_parameter | numeric |

    The Queue Fetcher

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    37/52

    ! When this machine has an empty run slot, fill itthe next job with the most priority from the que

    ! Has all of the required affinities of this machi

    ! This machine has all the affinities demanded

    ! Whose affinity parameters match allocable u

    machine

    ! Is not restricted by the run group restrictions

    ! Whose prerequisites have completed

    The Queue Fetcher

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    38/52

    !

    !

    !

    ! Wrap-up

    !

    !

    Agenda

    Overview A running system

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    39/52

    Overview A running system

    NAFDB

    Runner Script5

    Script6

    Script7

    Script8

    Runner

    Script1

    Script2

    Script3

    Script4

    Script configuration

    Script Schedules

    Script Queues

    Logs

    Alarming

    Yo

    N

    Runner Machine 1 Runner Machine 2 Servic

    Overview Runners

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    40/52

    Overview Runners

    Application 1

    Application 2

    Application 3

    Application N

    Machine 1

    Runner 1

    Machine2

    Runner 2

    Machine Y

    Runner Y

    Conclusion

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    41/52

    Conclusion

    !N/Af has been in production for years,

    hundreds and thousands of scriptssimultaneously and hundreds of million

    scripts in total

    !PostgreSQL is the basis for our high-v

    non-traditional queuing system

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    42/52

    !

    !

    !

    !

    ! Fiksus Open Source Projects

    !

    Agenda

    Gems

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    43/52

    Gems

    !N/Af

    !

    Af

    !

    Partitioned

    !Bulk Data Methods

    !PG Advisory Locker

    !PG Application Name

    N/Af

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    44/52

    N/Af

    !https://github.com/fiksu-public/naf

    !

    A network application framework that lPostgreSQL to deliver high volume, dis

    and redundant job scheduling and man

    Af

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    45/52

    Af

    ! https://github.com/fiksu-public/af

    !Application framework that supports:

    !

    Command line options integrated into instance (andvariables

    ! Logging via log4r

    ! PostgreSQL advisory locking via pg_advisor_locke

    ! PostgreSQL database connection updates via

    pg_application_name gem! Threads and message passing

    !Application components adding loggers and comm

    options

    Partitioned

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    46/52

    Partitioned

    !https://github.com/fiksu/partitioned

    !

    Adds assistance to ActiveRecord for manipu(reading, creating, updating) an ActiveRecord

    that represents data that may be in one of m

    database tables (determined by the Models d

    !

    Supports the creation and deletion of child tapartitioning support infrastructure.

    !Supports bulk inserts and updates

    Bulk Data Methods

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    47/52

    Bulk Data Methods

    !https://github.com/fiksu/bulk_data_me

    !

    MixIn used to extend ActiveRecord claimplementing bulk insert and update o

    through {#create_many} and {#update

    PG Advisory Locker

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    48/52

    PG Advisory Locker

    !https://github.com/fiksu/pg_advisory_lo

    !

    Helper for calling PostgreSQL functionpg_advisory_lock, pg_advisory_try_loc

    pg_advisory_unlock

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    49/52

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    50/52

    Agenda

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    51/52

    !

    !

    !

    !

    !

    ! Questions

    g

    Thank You!

  • 8/10/2019 High Volume Scheduling and Job Management With PostgreSQL

    52/52

    Want to talk?

    [email protected]

    www.fiksu.com

    @fiksu

    Learn more:

    https://github.com/fiksu-public/naf