Top Banner
Epic Fails in LiveOps James Gwertzman, CEO January 16, 2017
44

Epic Fails in LiveOps

Apr 11, 2017

Download

Technology

James Gwertzman
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Epic Fails in LiveOps

Epic Fails in LiveOps

James Gwertzman, CEO

January 16, 2017

Page 2: Epic Fails in LiveOps
Page 3: Epic Fails in LiveOps
Page 4: Epic Fails in LiveOps

Silverlining:Theyusedthistimetocatchuponcontent;mightneverhavecaughtupiftheydidn’tgetthatextratime.

Page 5: Epic Fails in LiveOps

Introduction to PlayFab

Page 6: Epic Fails in LiveOps

Game ManagerMissioncontrolforyourwholeteam.Allthedataandtoolsyouneedtoengage,retainandmonetizeyourplayers.

Game ServicesBack-endbuildingblocksforyourlivegame.Storage,compute,commerce,analyticsandmuch,muchmore.

Add-On MarketplacePre-integratedtoolsandservicesfromindustry-leadingpartners.ReduceSDKfatigue,with(mostly)single-clickaccess.

PlayStreamTheactivitystreamthattiesitalltogether.Events,triggers,real-timesegmentationtoautomateyourliveops.

PlayFab is a flexible LiveOps platform for games.

Page 7: Epic Fails in LiveOps

Full-text search for players

• Easilylocateplayers• Searchacrossallplayerproperties• Usewildcardmatches

1/18/17 PlayFab Confidential 7

Page 8: Epic Fails in LiveOps

Player segmentation

• Triggeractionsasplayerenter/exitsegments

• Updatedinreal-time• Setmanuallywithtags• Usesegmentstotarget

stores,runbulkactions

1/18/17 PlayFab Confidential 8

Page 9: Epic Fails in LiveOps

Create and manage item catalog• Itemscanhave:

– Limiteduses– Anexpirationtime– Customdata– Defaultpricesinmultiple

currencies– Tagstohelporganize

• Limitededitionitemshaveenforcedscarcity

• Catalogscanbeimported/exportedasJSONdata

• Updatecatalogfromserverdynamicallyatanytime

1/18/17 PlayFab Confidential 9

Page 10: Epic Fails in LiveOps

Item stores

• Onecatalogcanhavemultiplestores• Storescanhavedifferentprices• Storescanbetargetedtodifferent

playersegments

1/18/17 PlayFab Confidential 10

Page 11: Epic Fails in LiveOps

Time-based leaderboards for tournaments

• Leaderboardscanberesetonafixedschedule(daily,weekly,monthly)ormanuallyatanytime

• Whenleaderboardsreset,thelistofplayersatthetimeoftheresetisarchived

• Useleaderboardstandingattimeofresettoissueprizes,determinetournamentwinners

1/18/17 PlayFab Confidential 11

{"PlayerId":"4AC350E4134A36C8","Value":620}{"PlayerId":"3CC3A4D866D9580A","Value":620}{"PlayerId":"D15EFFB805045CFA","Value":620}{"PlayerId":"B8271B32A8035722","Value":620}{"PlayerId":"B188B845940ED6D3","Value":620}{"PlayerId":"321DBA3528144483","Value":500}{"PlayerId":"EA141B9B63B53583","Value":500}{"PlayerId":"DC01857A8D90B2F5","Value":500}

Page 12: Epic Fails in LiveOps

Host session-based game servers

• Uploadcustomgameserverbuilds• Configuremultiplayergamemodes• Selectregionswhereserversshould

behosted• Serverswillscaleautomaticallybased

onload

1/18/17 PlayFab Confidential 12

Page 13: Epic Fails in LiveOps

Server-hosted JavaScript

• Writeserver-basedcodewithoutadedicatedgameserver

• EasyuploadofJavaScriptcustomlogic• Makechangestoyourgamebehavior

withoutrequiringclientupdates• Serverauthenticationprotectsagainst

client-sidecheating• AccessthemorepowerfulServerAPI(with

featuresnotavailableontheclient)• GitHubintegrationforeasyrevisioncontrol

1/18/17 PlayFab Confidential 13

Applicationsinclude:• Grantingplayerrewards• Validatingplayeractions• Resolvinginteractionsbetweenplayers• Managingasynchronousgameturns

Page 14: Epic Fails in LiveOps

Trigger actions from real-time events

• Triggeractionsinresponsetoreal-timeevents

• Eventscancomefromclient,server,orthirdpartyvendors

• RichsetofactionsincludingrunningCloudScript orsendingpushnotifications

1/18/17 PlayFab Confidential 14

Page 15: Epic Fails in LiveOps

Scheduled jobs & bulk player actions

• Schedulejobstoruninthebackground

• Runonce,oronarecurringbasis• Schedulenow,orinthefuture• Runtasksforeachplayerina

segment,orforthetitle

1/18/17 PlayFab Confidential 15

Page 16: Epic Fails in LiveOps

Full-text event search

• Filterandsearchthroughrecenteventhistory

• Zoominonspecifictimeperiod• Lookforspecificplayers,event

types,orerrorconditions

1/18/17 PlayFab Confidential 16

Page 17: Epic Fails in LiveOps

Remotely manage game configuration

• Storegameconfigurationontheservertomodifybehaviorovertime

• Comingsoon:changeconfigurationbasedonplayersegment

1/18/17 PlayFab Confidential 17

Page 18: Epic Fails in LiveOps

More than 1,000 developers w/ 450+ live games

1/18/17

Page 19: Epic Fails in LiveOps

Daily Active Players (2016)

-

500,000

1,000,000

1,500,000

2,000,000

2,500,000

3,000,000

3,500,000

4,000,000

4,500,000

Jan1

Jan8

Jan15

Jan22

Jan29

Feb5

Feb12

Feb19

Feb26

Mar4

Mar11

Mar18

Mar25

Apr1

Apr8

Apr1

5Ap

r22

Apr2

9May6

May13

May20

May27

Jun3

Jun10

Jun17

Jun24

Jul1

Jul8

Jul15

Jul22

Jul29

Aug5

Aug12

Aug19

Aug26

Sep2

Sep9

Sep16

Sep23

Sep30

Oct7

Oct14

Oct21

Oct28

Nov4

Nov11

Nov18

Nov25

Dec2

Dec9

Dec1

6De

c23

Dec3

0

Page 20: Epic Fails in LiveOps
Page 21: Epic Fails in LiveOps

Tips to running a live service with a small team

• FullyleveragethecloudandotherSAASservices• Continuousintegration• Frequentandautomateddeploymenttolive• Allengineerstaketurnsbeing“on-call”

Page 22: Epic Fails in LiveOps

SAAS services we use to run PlayFab

1/18/17 22

Page 23: Epic Fails in LiveOps

Tools we depend on

Page 24: Epic Fails in LiveOps

Basic API handling architecture

Page 25: Epic Fails in LiveOps

CloudScript Execution

Page 26: Epic Fails in LiveOps

PlayStream event handling

Page 27: Epic Fails in LiveOps

Multiplayer game server hosting & scaling

Page 28: Epic Fails in LiveOps
Page 29: Epic Fails in LiveOps
Page 30: Epic Fails in LiveOps

How the cloud has changed deploymentsScenario A: Successful deployment

Dedicatedhardware

Cloud

BuildA

BuildA BuildB

BuildBDowntime

Page 31: Epic Fails in LiveOps

How the cloud has changed deploymentsScenario B: Rollback needed

Dedicatedhardware

Cloud

BuildA

BuildA

BuildB

BuildB

BuildA

Downtime

Page 32: Epic Fails in LiveOps

Thinking about #fails

• Notallfailureiscreatedequal• #failsrangefrompraiseworthytoblameworthy• Typesoffailure:– Failuresinroutineoperationswhichcanbeprevented– Failuresincomplexoperationswhichcan’tbeavoided,butcanbemanagedsotheydon’tturnintocatastrophe

– Unwantedoutcomesinresearch,whichgenerateknowledge• Goalswithfailureshouldbe:– Detectearly– Analyzedeeply– Designexperimentsorpilotstoproducethem

• Everyoneonteammustfeelsafeadmitting&reportingfailures

Source:StrategiesforLearningfromFailure.HarvardBusinessReview.April2011.

Page 33: Epic Fails in LiveOps

Our most common sources of failure

• Operatorerrors(e.g.,mis-configuration)• Designerrors(e.g.,cascadingfailures)• Unexpectedsituations(e.g.,surprisingcustomeractions)

Page 34: Epic Fails in LiveOps

Misconfiguration failure

• Failure:– Matchmakerserverwasdownfor13minutes

• Cause:– Wehaveaprimaryanda“hot”backup– Intheprimaryfails,trafficshouldswitchtobackup– Route53wasmisconfiguredtoroutetraffic(correctly)toprimary,butcheckhealthonthebackup(incorrectly)

– Whentheprimarydidfinallyfail,trafficdidn’tswitch

• Solution:– Short-term:Fixtheconfiguration– Long-term:Automatehealth—checkintegrity

Route53(DNSservice)

MatchmakerPrimary

MatchmakerBackup

Traffic HealthCheck

X

Page 35: Epic Fails in LiveOps

Design failure

• Failure:2-minutesystem-wideoutage• Cause:

– Agamewasrunningatestofitemconsumption– Designissue:calling”consume”loadedentireinventory– Result:100+requestsfor13Kiteminventoryin1minute– ThisblockedAPIservers,waitingforthedatabase– DynamoDBthenauto-scaled,socallsunblocked,leadingAPI

serverstothenpegCPUto100%processingload– Thismeantserversstoppedrespondingtohealthchecks– Serverswereallthenauto-terminated

• Solution:– Short-term:werolledback,whichredeployedservers– Long-term:APIthrottles,pagingdatarequests

Page 36: Epic Fails in LiveOps

Complexity failure• Failure:ElasticSearch wentdownfor2days• Cause:

– AWSElasticSearch wastryingtoscaleourcluster– Insteadofaddingnodes,itreplacesthem– Thisrequiresmovingalldatafromnodetonode– Theydon’tthrottledatamoves,soCPUswentto100%– Thistriggeredhealthcheckfails,andnodetermination– Butnewnodestriggerindexrebalancing,buttheywerealready

rebalancingbecauseofthescaling– Atonepointwewerelosing4nodesevery30minutes

• Solution:– Short-term#1:turnoffwritestocatch-up;notenough– Short-term#2:spinupnewcluster,aimwritesatnewcluster,

back-fillw/datafromKinesisqueue– Long-term:MoveoffAWSESontoourownEScluster;customize

configurationbasedonexperience

ESWrites/sec

ESDelayinseconds

Page 37: Epic Fails in LiveOps

Unexpected failure

• Failure:– Suddensurgeoftraffictoourdocsite,resemblingaDDoSattack

• Cause:– Customerwaspingingourdocsiterepeatedlyasa“healthcheck”– Theyranacustomeracquisitioncampaignsotrafficspiked– Wesawtheseunusualqueries,withastrangeuseragentstring– Weassumeditwasanattack,soquarantinedthatuseragentstring– Thishadtheaffectoftakingdowntheirgame!

• Solution:– Restoretheirtraffic;explaintodeveloperwhythisisabadidea

Page 38: Epic Fails in LiveOps

Other common customer fails

• Notusingreceiptvalidation• “We’llwait,andifit’ssuccessful,we’llinvestintools”• Notrunningevents• Launchingwithoutreal-timedata

Page 39: Epic Fails in LiveOps

Fake receipts are a big problem

0

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

180,000

200,000

LegitReceipts FakeReceipts

Page 40: Epic Fails in LiveOps

Why in-game events matter (AdCap)

PlayFabaddedasbackendplatform

Page 41: Epic Fails in LiveOps

Setting up a live event

• Moveart/assetsintoUnityAssetBundle• Moveplanetconfig toGoogleSheets• ExportdatatoPlayFabcatalogs

Page 42: Epic Fails in LiveOps
Page 43: Epic Fails in LiveOps

LiveOps depends on tools

Howmuchcanyoudowithoutwritingcode?

TheLiveOpsToolsContinuumWritingSQLHackinggameDB

WebtoolsModifygameparams

Page 44: Epic Fails in LiveOps

Questions?James [email protected]@gwertz