Scripted Earnings Conference Calls as a Signal of Future Firm Performance Joshua Lee Olin Business School Washington University in St. Louis St. Louis, MO 63130-6431 [email protected]January 2014 Abstract: I examine whether market participants infer negative information about future firm performance from managers’ scripted responses to questions received during earnings conference calls. I argue that firms script their Q&A session responses prior to periods of poor performance to avoid the inadvertent disclosure of information that can be used to build a lawsuit against the firm. Using a unique measure of conference call Q&A scripting, I provide evidence that scripted Q&A is negatively associated with future earnings and future cash flows, suggesting that, on average, firms script their Q&A when future performance is poor. I also find a negative market reaction to scripted Q&A and downward revisions in analysts’ forecasts following scripted Q&A, suggesting that investors interpret scripted Q&A as a negative signal of future firm performance. I also find that firms are less likely to guide future earnings when Q&A is scripted and that analysts’ forecasts are less accurate following scripted Q&A, suggesting that firms provide less information to market participants when Q&A is scripted. I thank Richard Frankel my dissertation committee chair for his guidance and mentorship. I also thank Gauri Bhat, Andrew Call, Ted Christensen, John Donovan, Bryan Graden, Jared Jennings, Chad Larson, Xiumin Martin, Lorien Stice-Lawrence, and Jake Thornock for their helpful comments. In addition, I thank workshop participants at Washington University in St. Louis and the Accounting Research Symposium at Brigham Young University.
56
Embed
Scripted Earnings Conference Calls as a Signal of Future Firm …€¦ · Scripted Earnings Conference Calls as a Signal of Future Firm Performance Joshua Lee Olin Business School
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Scripted Earnings Conference Calls as a Signal of Future Firm Performance
The empirical challenge of this paper is identifying cross-sectional variation in the extent
of conference call Q&A scripting. I develop my scripting measure using a computational stylistics
method developed in the linguistics literature to identify the authors of documents with unknown
or disputed authorship (see, e.g., Stramatatos 2009). The most well-known “authorship
attribution” studies use linguistic methods to ascertain who wrote twelve of the Federalist Papers
in which both Alexander Hamilton and James Madison claim authorship (Mosteller and Wallace
1963; Koppel, Schler, and Argamon 2009). Prior research suggests that the most effective method
for authorship attribution is the comparison of a set of function words between two documents
(Burrows, 1987; Stramatatos 2009; Mosteller 2010). Function words are those with primarily
grammatical functions and include articles (e.g., a, an, the), conjunctions (e.g., and, or, so),
pronouns (e.g., I, me, we), prepositions (e.g., of, on, in), and auxiliary verbs (e.g., is, do, can).6
Mosteller (2010) suggests that function words are the best stylistic discriminators between two
authors because they are unrelated to the topic discussed, and they reflect minor or even
unconscious preferences of the author. Thus, an author’s use of function words uniquely identifies
6 See Appendix A for a complete list of function words used in this study.
10
his/her style. Using this approach, studies overwhelmingly identify James Madison as the author
of the twelve disputed Federalist Papers (Mosteller and Wallace 1963).7
Using this method, I examine the extent of scripting of the Q&A session of the conference
call by comparing the use of function words by the CEO during the presentation session to the use
of function words by the CEO during the Q&A session.8 I assume the presentation session of the
call is a scripted outline of the performance of the firm during the quarter. Conversations with an
investor relations consultant and a member of the internal investor relations team at Morgan
Stanley confirm this assumption. The set of function words during this session of the call thus
serves as a baseline for which I can compare the set of function words during the Q&A session of
the call. A CEO is less likely to be relying on scripted responses to conference call questions if
the use of function words during the Q&A session is less similar to the use of function words
during the presentation session of the call. In other words, if the CEO’s speaking style changes
from the presentation session to the Q&A session, he/she is less likely to be using a script to
respond to analysts’ and investors’ questions.
For each conference call, I first identify the presentation and Q&A sessions of the call by
searching for key words such as “question” and “Q&A” within 2 lines of other key words such as
“take” or “open up.”9 I then identify the chief executive officer using the titles provided during
the call and obtain the portions of the call in which the executive is speaking.10 Next, I create two
7 Other methods used in prior work include comparing sentence lengths, word lengths, or uses of frequent words
between two documents. However, these methods are shown to be poor indicators of authorship (see Mosteller 2010).
For this reason, I use the most accepted approach of comparing function words between two documents. 8 The results of all tests remained qualitatively and quantitatively similar if I use the spokesman executive to compute
the scripting measure where the spokesman is defined as the CEO or CFO who speaks for the longest portion of the
conference call. See Section 6.1 for additional detail. 9 During the introduction of the call, the executives often provide an outline for the call and state they will be opening
up the call for questions later on in the call. To ensure I obtain the key words when the Q&A session truly begins
rather than a reference to it later in the call, I require the Q&A session to start at least 10% into the call. 10 In many instances, the conference call speaker is identified using an abbreviated version of the executive’s name.
For example, the executive might be referred to as David when introduced but then Dave later in the call. I manually
correct these differences to ensure I obtain the full text of the call for each executive.
11
vectors of the counts of the function words spoken by the CEO in each session of the call: 𝑣𝑄𝐴 and
𝑣𝑃𝑅𝐸𝑆, respectively, where QA represents the Q&A session and PRES represents the presentation
session. I then compute my measure of scripting as the cosine similarity between the two vectors
using the following formula:
𝑺𝑪𝑹𝑰𝑷𝑻 = 𝒄𝒐𝒔(𝜽) =𝒗𝑸𝑨 ∙ 𝒗𝑷𝑹𝑬𝑺
‖𝒗𝑸𝑨‖‖𝒗𝑷𝑹𝑬𝑺‖ (1)
where θ is the angle between 𝑣𝑄𝐴 and 𝑣𝑃𝑅𝐸𝑆, (∙) is the dot product operator, and ‖𝑣𝑖‖ is the length
of vector 𝑣𝑖 (i is equal to QA and PRES). The cosine similarity measure captures the uncentered
correlation between two vectors and provides an estimate of the similarity in the use of function
words by the executive during the presentation and Q&A sessions of the conference call.11 Its
values range between 0 and 1 where greater values indicate greater similarity. For ease in
economic interpretation in the multivariate analyses, I rank the SCRIPT measure into deciles from
0 to 9 and divide by 9 (RSCRIPT).12 I also require at least 200 words to be spoken by the CEO in
both the presentation session and the Q&A session of the call to reduce measurement error.
I verify the construct validity of the cosine similarity measure in identifying the speaking
style of the CEO by computing the cosine similarity measure between the vector of function word
counts spoken by CEO j during the Q&A (presentation) session for firm i in quarter t to the vector
of the combined conference call Q&A (presentation) sessions given by CEO j for firm i during all
other quarters. I then compute the cosine similarity between the CEO j Q&A (presentation)
function word count vector in quarter t to nine randomly selected combined word count vectors
for CEOs of other firms across the sample period. I then rank the actual CEO vector relative to
11 Brown and Tucker (2011) use the cosine similarity measure to compare firms’ MD&A disclosures over time.
Their word count vectors include all unique words in the disclosure to compare content, whereas I use only the
counts of function words to compare speaking style. 12 The results remain qualitatively unchanged if I use the unranked cosine similarity measure.
12
the nine randomly selected CEO vectors, where values of 1 (10) indicate the actual CEO vector is
the most (least) similar relative to the nine randomly-selected CEO vectors. Figure 2 presents the
cumulative percentage of firms in each ranking. If the ranking were random, the percentage of
firms in each ranking would be 10 percent. When comparing the Q&A session during the quarter
to the Q&A sessions of other quarters (Q&A to Q&A), the results indicate that 79.7 percent of the
similarity scores are highest for the actual CEO relative to the nine randomly selected CEOs. The
similarity score for the actual CEO is one of the top three highest for 93.3 percent of the
observations suggesting that the similarity score does a good job of identifying the speaking style
of the CEO. Similarly, when comparing the presentation session during the quarter to the
presentation sessions of other quarters (PRES to PRES), the results indicate that 80.5 percent of
the similarity scores are highest for the actual CEO relative to the nine randomly selected CEOs
suggesting that those who script the presentation session (e.g., the investor relations team) have
uniquely identifiable styles.13
I then compute the cosine similarity between the presentation session vector for CEO j of
firm i in quarter t and 1) the Q&A session vector for CEO j of firm i in quarter t and 2) nine
randomly-selected Q&A session vectors for CEOs of other firms. I then rank the similarity score
for the actual CEO vector relative to the randomly-selected CEO vectors. Figure 2 plots the
13 I further verify the accuracy of the cosine similarity measure in the most common setting used in the linguistics
literature: The Federalist papers. I compute the cosine similarity between the vector of word counts for each Federalist
paper and the vectors of word counts for the three known authors of the Federalist papers: John Jay, James Madison,
and Alexander Hamilton. I assign an author to each paper based on the highest similarity score for each paper relative
to the vectors of word counts for all other papers written by the three authors. For all five papers written by John Jay,
the similarity score correctly identifies John Jay as the author. For the 51 papers known to have been written by
Alexander Hamilton, the similarity score correctly identifies 48 as written by Hamilton and incorrectly identifies 3 as
written by Madison. For the 14 papers known to have been written by James Madison, the similarity score correctly
identifies 12 as written by Madison and incorrectly identifies 2 as written by Hamilton. For the 12 disputed papers, I
find 10 of the similarity scores are highest for James Madison and 2 of the similarity scores are highest for Alexander
Hamilton. These results are fairly consistent with prior research and provide additional evidence that the similarity
score using the list of function words employed in this study provides an accurate measure for detecting subtle
differences in style between two texts.
13
cumulative percentage of conference calls in each ranking (PRES to Q&A). I find that only 21.4
percent of the similarity scores are highest for the Q&A session of the actual CEO compared to
the nine randomly-selected Q&A sessions of other CEOs. This suggests two important points.
First, CEOs have unique styles relative to the investor relations teams that prepare the presentation
sessions of the calls. If not, the percentage of firms with rankings closer to 1 would have been
closer to 100 percent. Second, the percentage of firms with a ranking of 1 is greater than what
would be expected if the rankings were random (21 percent relative to 10 percent) suggesting that
some firms script their Q&A.
3.2 Test of hypothesis one
I test the association between Q&A scripting and firms’ future accounting performance
(Hypothesis 1) by estimating the following model similar to Core, et al. (1999), Bowen et al.
cumulative abnormal return over the [-127, -2] window prior to the conference call date. I also
include the current period earnings surprise (EARN SURPi,t) and the current period return on assets
(ROAi,t) to control for the market reaction to current period earnings and expect a positive
coefficient on these variables. I also include the earnings guidance variables GUIDANCEi,t and
GUID SURPi,t to control for quantitative information provided by the firm about future
performance. I also include the conference call specific variables TONEi,t, ln(CEO WC PRESi,t),
and ln(CEO WC QAi,t) to control for alternative linguistic features of the conference call and for
potential measurement error in the scripting measure. Consistent with prior research, I expect a
positive association between conference call tone and the market reaction to the call. Finally, I
include year-quarter and industry (two-digit SIC code) indicator variables and cluster the standard
errors by firm.
4. Sample selection and data
I obtain a sample of earnings conference calls by first matching all non-financial firms on
Compustat with non-missing total assets between 2002 and 2011 to their corresponding unique
Factiva identifiers using the company name provided by Compustat.15 For the 11,702 unique
Compustat firms, I find Factiva identifiers for 5,099 firms. Using each firm’s unique identifier, I
then search Factiva’s FD Wire for earnings conference calls made between 2002 to 2011 and find
56,822 total calls for 3,475 unique firms.16 I remove 15,384 calls in which the CEO speaks less
than 200 words in either the presentation or Q&A session of the call. Requiring financial
statement data from Compustat, IBES, and CRSP further reduces the sample by 5,142 calls, 1,370
15 In cases where the match is ambiguous, I check whether the city and state of the matched firm in Factiva matches
the city and state of the firm in Compustat. 16 Factiva contains different types of conference calls such as those discussing mergers and acquisitions. I focus only
on earnings-related conference calls. I filter out non-earnings related conference calls by requiring the term “earnings”
to be in the title of the call. I also require the conference call be made within 2 days of the earnings announcement.
18
calls, and 3,813 calls, respectively. The final sample consists of 30,773 earnings conference calls
for 2,384 unique firms with sufficient data to estimate the main empirical analyses.
Table 1 presents the means of the variables used in the empirical analysis for the full sample
and also for each quintile of the SCRIPTi,t measure. The final column in the table reports the test
statistic testing the difference between the fifth and first quintile. The mean of the scripting
measure (SCRIPTi,t) is 0.797 in the bottom quintile and 0.934 in the top quintile. The mean of
future return on assets (FUT ROAi,t) is 0.010 in the bottom scripting quintile and 0.005 in the top
quintile and the mean of future operating cash flows (FUT CFOi,t) is 0.015 in the bottom quintile
and 0.014 in the top quintile and the differences are statistically significant at the one percent level
providing preliminary evidence of a negative association between Q&A scripting and future
performance (Hypothesis 1).
Table 1 also reports a significant difference in the cumulative abnormal return at the
conference call date (CC CARi,t) between the top and bottom quintiles of the scripting measure (-
0.001 compared to 0.005) providing preliminary evidence that investors interpret scripting as a
signal that mangers possess negative information about future firm performance (Hypothesis 2).
The cumulative abnormal return in the 252 trading days following the conference call (FUT CARi,t)
shows no difference between the top and bottom quintiles, suggesting that investors understand
the implications of Q&A scripting and there is no drift. I also find analyst forecast revisions
following the conference call (FREVi,t+1) are more negative in the top quintile of the scripting
measure relative to the bottom quintile (-0.193 compared to -0.148). I also find that 19.9% of
firms in the top quintile of the scripting measure provide guidance for next quarter’s EPS
(GUIDANCEi,t) compared to 22.8% in the bottom. I do not, however, find a difference in analyst
forecast accuracy (ACCURACYi,t+1) between the top and bottom quintiles.
19
I also report the means of the control variables used in the empirical analysis. The
significant differences between the top and bottom quintiles for these variables underscore the
importance of including these variables in the empirical analysis to control for alternative
explanations. I specifically find that firms in the top quintile are larger with greater analyst
following and institutional ownership, have lower current period market and accounting
performance, have lower book-to-market ratios, have been listed on Compustat for less time, and
provide more negative forecasts of future EPS. I also find that firms with CEOs who speak more
words during both the presentation and Q&A sessions have more scripted conference calls which
can be attributable to two forces. First, when the firm does not wish to inadvertently disclose
information to outsiders, it may script longer presentations and responses to analysts’ questions to
allow for less time for multiple questions to be asked. Second, higher word counts allow a more
precise measurement of the scripting variable potentially creating a bias in the measure. Hence, I
include these two measures in each regression specification to control for this possibility.17
5. Results
5.1. Results for hypothesis one
Table 2 presents the results of estimating Equation 2. In Column 1 (2) the dependent
variable is FUT ROAi,t (FUT CFOi,t). The coefficient on RSCRIPTi,t is -0.003 in Column (1) and
-0.003 in Column (2) and both are significant at the one percent level. The coefficient estimates
suggest that relative to firms in the bottom decile of the scripting measure, firms in the top decile
have a 45 percent lower return on assets in the four quarters following the conference call (-
17 To further rule out the possibility that measurement error in the scripting measure is affecting my results, I re-
estimate the scripting measure holding the number of words constant across firms. I continue to find a highly positive
correlation between this alternative scripting measure and the total number of words spoken by the CEO during both
the presentation and Q&A sessions of the call, suggesting measurement error is not driving the large positive
association between call length and my scripting measure. The results of my empirical analyses are also robust to
using this alternative measure. See Section 6.1 for more detail.
20
0.003/0.0066 = -0.45) and 21 percent lower operating cash flows in the four quarters following the
conference call (-0.003/0.0141 = -0.21). These results suggest that firms script Q&A when future
accounting performance is poor and are consistent with my first hypothesis.
The control variables indicate that larger firms with more institutional ownership, lower
return volatility, and lower analyst following have higher future earnings and cash flows. I find
positive coefficients on the current period performance measures consistent with persistence in
performance. I also find that younger firms with higher stock turnover and lower earnings
volatility have higher future cash flows but that these variables are insignificant in the future
earnings regression. In addition, future earnings and cash flows are higher when firms provide
guidance and when the guidance is more positive. I also find that conference call tone loads
positively in both future performance regressions, suggesting that managers use positive tone when
future performance is high. I do not find a relation between future performance and the number
of words spoken by the CEO during the Q&A session, but lower future earnings when the
presentation session is longer.
5.2. Results for hypothesis two
I next estimate the relation between scripting and the market reaction at the time of the
conference call. Panel A of Table 3 presents the results of estimating Equation (3). The coefficient
on RSCRIPTi,t is -0.008 and significant at the one percent level in Column (1) without including
the control variables. After including the control variables in Column (2), the magnitude of the
coefficient drops to -0.003 but remains statistically significant at the one percent level. The
coefficient in Column (2) indicates that relative to firms in the bottom decile, firms in the top decile
of RSCRIPTi,t have 139 percent lower abnormal returns at the conference call date relative to the
mean of CC CARi,t (-0.003/0.00216 = -1.39). This result is consistent with investors interpreting
21
scripted calls as a negative signal of future performance and supports my second hypothesis. The
control variables indicate that larger and higher growth firms have lower conference call returns.
I also find a negative relation between the conference call return and return momentum. Firms
with more positive ROA and more positive earnings surprises also have higher abnormal returns.
I also find that firms with more positive earnings guidance on the day of the call have higher
abnormal returns on the day of the call, but that the decision to guide future earnings is negatively
associated with the abnormal return. In addition, firms with more positive conference call tone
have higher abnormal returns consistent with prior research. I also find that firms with longer
presentation sessions have lower abnormal returns.
I next examine whether scripted conference calls are associated with future abnormal
returns to determine whether investors over or under react to scripted calls at the conference call
date. Panel B reports the results of Equation (3) replacing CC CARi,t with FUT CARi,t, defined as
the abnormal return for the 252 trading days following the conference call using the window [2,
254]. In Column (1) I do not find a significant relation between RSCRIPTi,t and FUT CARi,t,
suggesting that the reaction at the conference call date does not reverse in future periods on
average, and hence, was not an overreaction. Instead, scripted calls provide investors with a signal
of future negative performance at the conference call date.
However, some firms are likely to script their Q&A responses for reasons unrelated to
future performance. For example, proprietary costs of inadvertent disclosure can induce some
firms to script their Q&A responses to avoid revealing information about the firm’s products. In
addition, managers who are less confident in their ability to respond to questions are likely to rely
on scripted responses to avoid tarnishing their reputational capital. For these firms, when future
performance materializes and the market’s negative prior assessment of performance is proved
22
inaccurate, the negative stock market response is likely to reverse. I test this conjecture by
including the interaction between RSCRIPTi,t and an indicator variable equal to 1 for below median
values of FUT ROAi,t (LOW FUT ROAi,t) as an additional control variable in the second column of
Panel B. I expect a positive coefficient on RSCRIPTi,t if returns reverse for firms with high
subsequent performance. In contrast, I expect the sum of the coefficients on the RSCRIPTi,t
measure and the interaction between RSCRIPTi,t and LOW FUT ROAi,t to be insignificant if returns
do not reverse for firms with poor subsequent performance. The results in Column (2) are
consistent with these expectations. The coefficient on RSCRIPTi,t is 0.032 and is significant at the
one percent level. In contrast, the sum of the coefficients on the RSCRIPTi,t measure and the
interaction between RSCRIPTi,t and LOW FUT ROAi,t is 0.005 and is insignificant.
Next, I corroborate the results in Panel A of Table 3 by examining revisions of analysts’
EPS forecasts for quarter t+1 following the conference call date. Specifically, I regress analyst
forecast revisions, FREVi,t+1, defined as the median analyst EPS forecast for quarter t+1 for all
forecasts made within 30 days following the conference call date less the median consensus
forecast of quarter t+1 directly prior to the conference call divided by price and multiplied by 100
on the scripting measure and other control variables.18 Table 4 presents the results and reports a
negative coefficient on RSCRIPTi,t of -0.10, which is significant at the one percent level. The
coefficient estimate suggests that moving from the bottom to the top decile of the RSCRIPTi,t
measure is associated with a 51 percent decrease in FREVi,t+1 relative to the mean of FREVi,t+1 (-
0.10/-0.195 = 0.51). This result is consistent with the abnormal returns tests and suggests that
analysts revise downward their forecasts of future earnings after scripted conference calls. Thus,
sophisticated investors (i.e., analysts) view conference calls as a negative signal of future firm
18 I multiply by 100 to be able to observe the coefficient on the scripting variable without reporting several decimal
places.
23
performance consistent with my second hypothesis. The control variables indicate that analysts
revise their forecasts upward following large current period earnings surprises and following calls
with positive disclosure tone. Analysts also revise their forecasts upward following positive
earnings guidance, but downward if the firm decides to guide earnings. I also find that analysts
revise their forecasts upward following calls with longer Q&A sessions.
Overall, I find evidence consistent with my hypotheses. These results suggest that firms
script Q&A responses when managers possess negative information about future firm
performance, that investors interpret scripted calls negatively.
6. Do firms provide more or less information during scripted conference calls?
The negative market reaction to scripted Q&A is consistent with the following two
alternative explanations: 1) investors interpret scripted responses as a negative signal of future
performance or 2) managers use scripted responses to provide additional information about the
negative expected performance. I attempt to distinguish these explanations in two ways. First, if
firms use scripted conference calls to provide additional information about future performance,
scripted calls are likely associated with a greater propensity to provide guidance about future
earnings. In contrast, if firms provide less information during scripted calls, I expect scripted calls
to be associated with a lower propensity to provide guidance about future earnings. This is a direct
measure of managers’ decisions to provide additional information during scripted earnings calls.
Second, if firms provide additional information during scripted earnings calls, market
participants are likely to have a richer information set to predict future firm performance. I focus
on analysts who aggregate data from firm, industry, and market sources to produce earnings
forecasts, stock recommendations, and other analyses to aid investors in establishing earnings
expectations for the firm (see, e.g., Brown and Rozeff, 1978; Givoly and Lakonishok, 1979; Brown
24
et al., 1987; Fried and Givoly, 1982; Asquith et al., 2005; Frankel et al., 2006). Prior research
suggests that conference calls are useful for analysts in establishing forecasts for future periods.
For example, Bowen et al. (2002) find that conference calls improve analysts’ forecasting ability,
and Mayew (2008) suggests that analysts benefit from their ability to ask questions of management
during the Q&A session of the call. If analysts have a richer information set following scripted
calls, I expect their forecasts to be more accurate. If, on the other hand, firms provide less
information during scripted calls, I expect analyst forecasts are less accurate following scripted
calls.
I test whether firms are less likely to provide earnings guidance when conference calls are
YEAR WINDOW SURROUNDING THE LITIGATION FILING DATE
FIGURE 3. VALUES OF RSCRIPT IN YEARLY WINDOWS
SURROUNDING THE LITIGATION FILING DATE
46
Table 1
Descriptive statistics.
This table presents the means of variables used in the empirical analysis by quintile of SCRIPTi,t. The sixth column presents the test statistic of the difference
in means between the top and the bottom quintile. The penultimate column reports the means for the full sample and the final column reports the standard
deviations for the full sample. *, **, and *** represent significance at the 10%, 5%, and 1% levels, respectively. All continuous variables are winsorized at
the 1st and 99th percentiles. All variables are defined in Appendix B.