Top Banner
Failsafe Mechanism for Yahoo Homepage Using Apache Storm & Apache Traffic Server Pushkar Sachdeva ([email protected] ) Kit Chan ([email protected] ) 05/2016
25

Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

Feb 13, 2017

Download

Documents

ngotu
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

Failsafe Mechanism for Yahoo Homepage

Using Apache Storm & Apache Traffic ServerPushkar Sachdeva ([email protected])

Kit Chan ([email protected])

05/2016

Page 2: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...
Page 3: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

Failsafe

“A fail-safe or fail-secure device is one that, in the event of a specific type of failure, responds in a way that will cause no harm, or at least a minimum of harm, to other devices or to

personnel”

Page 4: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

Overall Architecture

Yahoo! Presentation, Confidential

Browser

ELB

EC2 ATS

S3

Property ATS

PropertyServing Stack

Crawler on Storm

AWSYahoo

Auto activate Failsafe

Switch traffic to AWS

Offstage Data Flow

Online Request FlowNormal Operation

Online Request FlowFailsafe Mode

Page 5: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

AWS Failsafe Stack Architecture

Elastic Load Balancer

S3 Bucket

Security Group

ATS EC2 Instances

ATS Server

VPC

Availability Zone #1

ATS EC2 Instances

ATS Server

Availability Zone #2

Region (US W Oregon)Region (US E North Virginia)Region (Ireland)Region (Singapore)

S3 Replication across regions

Cloud watch

Crawled data from Yahoo

https

http

Page 6: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

EC2 Instance - ATS● Instance (amazon linux)

○ t2.large - burstable ○ 2 vCPUs/8GB RAM/1 gbps network

● Apache Traffic Server ○ For caching

■ Negative caching enabled■ Ramdisk used

○ Health Check/S3 Authentication plugin○ Lua plugin

■ Query Parameters Sorting■ Simple Device Detection■ Error handling

● Cloudwatch Log Agent/Monitoring Scripts● Autoscaling based on # of incoming requests● Deployment Mechanism using Terraform / Packer

ATS 4Gb ramdisk cache

Amazon Linux

CloudwatchAgent

CloudwatchMonitoring Scripts

Page 7: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

Lua script example - sorting query parameters function do_remap()

local query = ts.client_request.get_uri_args()

if (query ~= nil and query ~= '') then

local result = {}

local i = 1

for value in query:gmatch '([^&]*)' do

if (value ~= '') then

result [i] = value

i = i + 1

end

end

table.sort(result)

local sorted_query = table.concat(result, '&')

ts.client_request.set_uri_args(sorted_query)

end

end

Page 8: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

Cloudwatch Log Agent Conf# /etc/awslogs/awslogs.conf# Custom ATS log enabled and in /usr/local/var/log/trafficserver/mon

[monlog]datetime_format = %Y-%m-%d %H:%M:%Sfile = /usr/local/var/log/trafficserver/mon.*buffer_duration = 5000log_stream_name = {instance_id}initial_position = start_of_filelog_group_name = monlog

Page 9: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

Perl Script calling Cloudwatch Monitoring Lib

+ if ($report_chr) {+ my $result = `/usr/local/bin/traffic_line -r proxy.node.cache_hit_ratio_avg_10s`;+ add_metric('CacheHitRatio', 'Percent', 100 * $result);+ }+ if ($report_tef) {+ my $connect_failed = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors.connect_failed`;+ my $aborts = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors.aborts`;+ my $possible_aborts = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors.possible_aborts`;+ my $pre_accept_hangups = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors.pre_accept_hangups`;+ my $early_hangups = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors.early_hangups`;+ my $empty_hangups = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors.empty_hangups`;+ my $other = `/usr/local/bin/traffic_line -r proxy.node.http.transaction_frac_avg_10s.errors.other`;++ add_metric('TransErrorFraction', 'Percent', 100 * ($connect_failed + $aborts + $possible_aborts + $pre_accept_hangups + $early_hangups + $empty_hangups + $other));+ }

Page 10: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

Cloudwatch Dashboard

Page 11: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

AWS Autoscaling - Terraform Configuration Fileresource "aws_autoscaling_group" "fsfb_base_load" { availability_zones = ["${split(",", var.zones)}"] name = "${var.env}_fsfb_base_load-${aws_launch_configuration.fsfb_ats.name}" load_balancers = ["${aws_elb.fsfb_elb.name}"] max_size = 8 min_size = 2 health_check_grace_period = 180 health_check_type = "ELB" desired_capacity = 2 launch_configuration = "${aws_launch_configuration.fsfb_ats.name}" force_delete = true wait_for_elb_capacity = 2 lifecycle {

create_before_destroy = true }}

Page 12: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

AWS Autoscaling - Terraform Configuration File (Cont’d) resource "aws_autoscaling_policy" "fsfb_scale_out_med" {

name = "${var.env}_fsfb_scale_out_med"scaling_adjustment = 8adjustment_type = "ExactCapacity"cooldown = 300autoscaling_group_name = "${aws_autoscaling_group.fsfb_base_load.name}"

}

Page 13: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

AWS Autoscaling - Terraform Configuration File (Cont’d)resource "aws_cloudwatch_metric_alarm" "fsfb_upper_medium_rps" {

alarm_name = "${var.env}_fsfb_upper_medium_rps"comparison_operator = "GreaterThanOrEqualToThreshold"evaluation_periods = "1"period = "60"metric_name = "RequestCount"namespace = "AWS/ELB"statistic = "Sum"threshold = "75000"dimensions {

LoadBalancerName = "${aws_elb.fsfb_elb.name}"}alarm_description = "This metric monitors medium elb traffic"alarm_actions = ["${aws_autoscaling_policy.fsfb_scale_out_med.arn}", "${var.sns_email_topic}"]

}

Page 14: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

Escalate Plugin in Apache Traffic Server (ATS) ● ATS is a proxy server that sits between the user and the origin server

● ‘Escalate’ is an ATS plugin that fetches content from failsafe servers when the origin server fails to provide a ‘good’ response.

ATS Origin Server User

Page 15: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

Escalate Plugin in ATS (Continued)● ‘Escalate’ is a remap plugin -

map http://games.yahoo.com/ http://some_origin.yahoo.com/ @plugin=ats_escalate.so @pparam=some_label

● Loads global configuration with ‘label’ definitions● Sample ‘label’ definition -

"some_label" : { "enable" : 1, "response" : { "500" : { "mode" : "url", "url" : "http://brb.yahoo.net/$h/$d/$p$x" } } }

Page 16: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

Escalate Plugin in ATS (Continued)● Runs in ‘READ_RESPONSE_HDR_HOOK’ ● Uses 'TSHttpTxnRedirectUrlSet’ to fetch content from failsafe servers

if (EscalateLabel::ACTION_URL == entry->second.mode) { std::string content; MyExpander expander(txn, entry->second.url); if (!expander(entry->second.url, config->get_device_type_header(), config->get_default_device_type())) { TSError("[" PLUGIN_TAG "] invalid expansion"); TSDebug(PLUGIN_TAG, "invalid expansion"); goto finish; } expander.swap(content); url_str = TSstrdup(content.c_str()); length = content.size(); if (url_str) { TSHttpTxnRedirectUrlSet(txn, url_str, length); // Transfers ownership }}

Page 17: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

Apache Storm Crawler● Based on scalable Apache Storm platform● Topology● Spouts● Bolts

Spout Bolt

Spout Bolt

Bolt

Page 18: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

Apache Storm Crawler (Continued)Simplified Topology

Cron Feeder

Changelog Feeder

IndexUrlConfigFetcher

UrlFetcher

Memory Storage Writer

Response Processor

Response Uploader

Custom Event Queue UpdaterCustom Event

Queue Feeder

Page 19: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

Apache Storm Crawler (Continued)● Crawls content for desktop, smartphone and tablet● Supports domain level configuration for request headers, query params and

output storage. ● Failsafe url path mapping example -

Mapping: http://{failsafe_host}/{original_domain}/{device}/{path};{sorted_query_params_as_matrix_params}

URL: https://www.yahoo.com/news/trump-unveils-foreign-policy-plan-201628138.html?q=1&a=2

S3 file path: http://brb.yahoo.net/www.yahoo.com/smartphone/news/trump-unveils-foreign-policy-plan-201628138.html;a=2;q=1

Page 20: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

High Level Architecture

Proxy Router Proxy Cache Origin Server

Failsafe CrawlerAWS storage

1

105

4

32

9

8 7

6

User

7

6

4

35

2

1

PUT

Offline Crawler Request FlowUser Request Flow

Optional Request Flow to fetch failsafe content

Page 21: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

Benefits● No manual intervention needed to serve failsafe content● Granular control● More relevant content is shown to user● Failsafe content is cached in proxy layer

Page 22: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

Pitfalls/Limitations ● Lagging Crawler● Handling additional Crawler traffic● Bucket specific experience● Malformed Page

Page 23: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

Future on Resiliency - multi-cloud for failsafe ● Additional Cloud Vendor

○ E.g. Google Cloud Platform○ S3 vs Google Cloud Storage○ EC2/ELB vs Google Compute Engine○ Cloudwatch vs StackDriver

● Changes in Apache Storm Crawler○ Can use Apache jclouds to create objects in storage in S3 or Google Cloud Storage

● Changes in deployment using terraform / configuration using chef○ GCP & AWS are supported

● Route 53 can be used to do failover to GCP

Page 24: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

Future on Resiliency ● Speculative Retry

void SpeculativeRetryPlugin::handleInputComplete(){ orig_url_ = transaction_.getClientRequest().getUrl().getUrlString(); //fetch original request sendFetchRequest(orig_url_, false); //start a timer which would give a callback after ‘time_’ msecs Async::execute<AsyncTimer>(this, new AsyncTimer(AsyncTimer::TYPE_ONE_OFF, time_), getMutex());}

void SpeculativeRetryPlugin::handleAsyncComplete(AsyncTimer &async_timer){ async_timer.cancel();

//active_fetch keeps track if we have received the response of original request yet or not //if not initiate a retry request if(!active_fetch_) { sendFetchRequest(orig_url_, true); }}

Page 25: Failsafe Mechanism for Yahoo homepage using Apache Traffic ...

Thank you. Questions?