Top Banner

of 13

Homework 4 - Sebastian Rojas v1

Jul 06, 2018

Download

Documents

Sebastian Rojas
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/18/2019 Homework 4 - Sebastian Rojas v1

    1/13

    Homework 4: Time SeriesAnalysis of “Gone Girl” Daily BoxOce

    Sebastian RojasRegression and Multivariate Data Analysis

    Prof. Jeffrey Simonoff Fall 201

  • 8/18/2019 Homework 4 - Sebastian Rojas v1

    2/13

    !rying to find "redi#tors for a film$s bo% offi#e is #om"le% "roblem t&at &as &aunted t&e filmindustry for years. 'n t&e last de#ade t&oug&( )oogle sear#& and so#ial media &ave been"raised as a good t&ermometer of so#ial "&enomenon. 'n fa#t in June 201*( Andrea +&en()oogle$s "rin#i"al industry analyst for media and entertainment( #laimed t&at a movie$s bo%offi#e #ould be "redi#ted as far as four ,ee-s in advan#e using )oogle sear#& 1. For t&e"ur"ose of t&is "a"er( ' ,anted to test t&is #laim #ombining t&e "ubli#ly available )oogle!rends tool ,it& Fa#eboo- data and !,itter data. !&e movie analyed for t&is "a"er ,as/)one )irl dire#ted by David Fin#&er and starring en Affle#- and Rosamund Pi-e(released in #tober *rd 201. 3o,ever( given t&e nature of t&e data "ubli#ly available( t&isrelation ,as analyed on a day4to4day simultaneous basis( and not in t&e "redi#tive ,ayde#lared by +&en.

    !&e target variable on t&e analysis is t&e daily bo% offi#e of t&e movie bet,een #tober 10 t&

    201 and 5ovember 6t& 201 7*0 days8 ta-en from o% ffi#e Mojo. !&e reason to use only*0 days is be#ause most of t&e "redi#ting variables are only available for *0 days "reviousto t&e moment ,&en t&e data is e%tra#ted. Data #overing t&e #om"lete "eriod from t&erelease date of t&e movie until no, ,ould be ideal( but is only available using "aid so#ial

    media analyti# tools t&oug&t for businesses. !&e "redi#ting variables are t&en t&e follo,ingones9

    a8 Google Trend Results for the search topic “Gone Girl”, with label “2014 Film”)oogle instantly se"arates to"i#s t&at mig&t &ave a similar title using labels( allo,ingus to differentiate t&e film from t&e &omonymous boo- on ,&i#& t&e film is based.!&e ,ay )oogle !rends results are "resented as observations in t&e range 04100.100 is set as a defining referen#e for t&e rest of t&e observations( ta-ing t&e day,&ere t&e &ig&est number of sear#&es ,ere "erformed on t&e defined "eriod. :veryot&er number is relative to t&at "ea-. )oogle doesn$t allo, setting #ustomied dates.!&e only "ossible fine4tuning is to define t&e sear#& as relative to t&e "ast ,ee-( "ast*0 days( ;0 days( *

  • 8/18/2019 Homework 4 - Sebastian Rojas v1

    3/13

    Sin#e ,e are trying to "redi#t a money variable( and our &y"ot&esis is t&at all t&e "redi#torsare related to audien#e variables t&at s&ould a##ount for "ro"ortional #&anges in t&e targetvariable 7t&ey a#t as a sam"le of t&e overall "o"ulation interest in t&e movie8( t&e relationbet,een t&em ,as treated as multi"li#ativemulti"li#ative. !&erefore( t&e target variable as,ell as t&e "redi#tors ,ere logged on base 10. !&e variables are t&en9

    +redictors

    Bog 7gti89 Bogged )oogle !rendsBog 7Fus#&ange89 Bogged +&ange in Ameri#an fans of t&e Fa#eboo- "age for t&emovieBog 7t,))89 Bogged number of mentions to /)one )irlBog 7t,C))89 Bogged number of mentions to t&e offi#ial t,itter a##ount of /)one )irl

    Target ariable

    Bog7DD)89 Bogged Daily Domesti# )ross

    !&e s#atter"lots for t&e different variables are t&e follo,ing9

    .

  • 8/18/2019 Homework 4 - Sebastian Rojas v1

    4/13

    4

  • 8/18/2019 Homework 4 - Sebastian Rojas v1

    5/13

     All four s#atter"lots a""ear to s&o, a strong and signifi#ant relation bet,een t&e "redi#torsand t&e target variable( "arti#ularly Bog 7gti8. 'f ,e run a best subsets regression( t&e out"utis t&e follo,ing one9

    Best Subsets Regression: Log(DDG) versus Log(gti),

    Log(FBuschange), ...

    Response is Log(DDG)

      L

      o

      g

      (

      F

      B L

      u L o

      L s o g

      o c g (

      g h ( t

      ( a t w

      g n w @

      t g G G

      R-Sq R-Sq Mallows i e G G

    Vars R-Sq (ad) (pred) !p S ) ) ) )

      " #$%& #$%' #'%" % %"*+$+ ,

      " ''%& '+%& *'%& &&%* %++*& ,

      + ##% #%# #% "%# %"+"*' , ,

      + #%# #$%# #'%$ %" %"*# , ,

      * #&% #% #'%# *% %"+"+ , , ,

      * ##%# #%$ #'%# *% %"++& , , ,

      ' #&% #%* #*%' $% %"+'+ , , , ,

    @it& t&ese results( t&e best model a""ears to be undoubtedly t&e t,o variable model t&at-ee"s Bog7gti8 and Bog7Fus#&ange8. ?nder t&is model Mallo,s +" is t&e smallest( and

    3

  • 8/18/2019 Homework 4 - Sebastian Rojas v1

    6/13

    bot& "redi#ted R2 and adjusted R2 are ma%imied. 't is interesting to noti#e t&at ,&en ,eta-e sim"le linear model t&at only #onsiders Bog7gti8( ,&i#& from t&e s#atter "lots a""ears tobe t&e "redi#tor t&at &as t&e strongest relation ,it& t&e target variable( Mallo,s +" is

  • 8/18/2019 Homework 4 - Sebastian Rojas v1

    7/13

    !&e Durbin4@atson statisti# for t&e regression is dG1.2=

  • 8/18/2019 Homework 4 - Sebastian Rojas v1

    8/13

     A runs test ,as also "erformed ,it& t&e follo,ing results9

    Runs test 0or SR2S"

    Runs a;o

  • 8/18/2019 Homework 4 - Sebastian Rojas v1

    9/13

    !&is s"i-e starts on Friday( and is one of t&e reasons on ,&y most movies are "remiered on!&ursdays. Sin#e t&e first observation #orres"onds to a Friday and t&ese s"i-es are s"a#edby one ,ee-( t&ere$s an auto#orrelation of order H.

    'n order to #orre#t t&is( a seasonal indi#ator variable ,as added for t&e observations#orres"onding to a Friday. !&e out"ut and residual "lots for t&e ne, regression model are

    t&e follo,ing ones9

    Regression Analysis: Log(DDG) versus Log(gti), Log(FBuschange),FR

    Method

    !ategorical predictor coding ("? )

    .nal/sis o0 Variance

    Source DF .d SS .d MS F-Value 1-Value

    Regression * *%*+'# "%"#* ""%"$ %  Log(gti) " "%$+' "%$+' +'%+ %

      Log(FBuschange) " % % ""% %*

      FR " %+"#&* %+"#&* *"%# %

    2rror + %"#$# %#

    3otal +& *%'&&

    Model Su44ar/

      S R-sq R-sq(ad) R-sq(pred)

    %#+#$# &'%&5 &'%*"5 &*%*&5

    !oe00icients

    3er4 !oe0 S2 !oe0 3-Value 1-Value V6F

    7

  • 8/18/2019 Homework 4 - Sebastian Rojas v1

    10/13

    !onstant '%'& %* "'% %

    Log(gti) +%"'# %"*& "$%$ % +%

    Log(FBuschange) -%'* %"*" -*%** %* +%

    FR

      " %++& %' $%$ % "%

    Regression 2quation

    FR

    Log(DDG) 7 '%'& 89+%"'#9Log(gti) -9%'*9Log(FBuschange)

    " Log(DDG) 7 '%+ 89+%"'#9Log(gti) -9%'*9Log(FBuschange)

    Fits and Diagnostics 0or nusual A;ser

  • 8/18/2019 Homework 4 - Sebastian Rojas v1

    11/13

    'f ,e loo- at t&e Auto#orrelation fun#tion under t&is ne, model( all forms of auto#orrelationseem to &ave gone a,ay9

    11

  • 8/18/2019 Homework 4 - Sebastian Rojas v1

    12/13

    Similarly( t&e "4value in t&e runs test is above t&e level of signifi#an#e( &en#e reje#ting t&enull &y"ot&esis of auto#orrelation9

    Runs Test: SRS!

    Runs test 0or SR2S*

    Runs a;o

  • 8/18/2019 Homework 4 - Sebastian Rojas v1

    13/13

    'n #on#lusion( t&e estimated regression eLuation t&en indi#ates t&at for t&e mont& of t&eanalysis9

    • Beaving everyt&ing else fi%ed( "ro"ortional in#rements in t&e )oogle !rend 'nde%

    &ave multi"li#ative effe#ts t&at are more or less eLual to t&at "ro"ortional in#rementsLuared. !&is strong relation #ould be seen in t&e initial s#atter"lot. ?nli-e so#ialnet,or- a#tivities( "erforming a sear#& on )oogle is most "robably ,&at everybodyt&at goes to t&e movies does to find a time( t&eater( e%&ibition times( et#. Part of t&ee%"onential effe#t in t&e "redi#tor #ould also be tied to t&e fa#t t&at a "ersonal sear#&on )oogle #ould im"ly more t&an one ti#-et boug&t "er sear#&.

    • Beaving everyt&ing else fi%ed( Fridays &ave a H0 &ig&er daily o% ffi#e t&an t&e

    rest of t&e days of t&e ,ee-.

    • Beaving everyt&ing else fi%ed( a "ro"ortional in#rease in Fa#eboo- li-ers of t&e

    offi#ial movie "age on Fa#eboo- is tied to a "ro"ortional de#rease in o% ffi#e t&at

    is slig&tly less t&an &alf t&e "ro"ortional in#rease in Fa#eboo- li-ers. !&is result is#ertainly sur"rising( and t&e diffi#ulty to understand it #ould be eit&er be lin-ed to t&e"ossibility t&at t&e "redi#tor &el"s to #alibrate an overestimation in t&e relationbet,een t&e )oogle !rends 'nde% and Daily o% ffi#e( or s"ea-s to t&e fa#t t&at ,estill don$t fully understand &o, be&avior on so#ial net,or-s affe#ts #onsum"tion andt&e effe#t ,as not "ro"erly #oded in t&e regression.

     

    1.