Ali Klasa Yueng RFS 2009

The Limitations of Industry ConcentrationMeasures Constructed with Compustat Data:Implications for Finance ResearchAshiq AliSchool of Management, University of Texas at Dallas

Sandy KlasaDepartment of Finance, University of Arizona

Eric YeungJ. M. Tull School of Accounting, University of Georgia

Industry concentration measures calculated with Compustat data, which cover only thepublic firms in an industry, are poor proxies for actual industry concentration. Thesemeasures have correlations of only 13% with the corresponding U.S. Census measures,which are based on all public and private firms in an industry. Also, only when U.S. Censusmeasures are used is there evidence consistent with theoretical predictions that more-concentrated industries, which should be more oligopolistic, are populated by larger andfewer firms with higher price-cost margins. Further, the significant relations of Compustat-based industry concentration measures with the dependent variables of several importantprior studies are not obtained when U.S. Census measures are used. One of the reasonsfor this occurrence is that Compustat-based measures proxy for industry decline. Overall,our results indicate that product markets research that uses Compustat-based industryconcentration measures may lead to incorrect conclusions. (JEL G10, G30, L10)

A growing number of studies that consider the effects of product markets onfinancial economics-related phenomena use industry concentration measurescalculated with Compustat data, which cover only the public firms in an indus-try. These studies examine issues related to asset pricing (Hou and Robinson2006), informed trading (Tookes 2008), idiosyncratic stock return volatility(Gaspar and Massa 2006), mergers and acquisitions (Song and Walkling 2000;Fee and Thomas 2004; Shahrur 2005), corporate governance (DeFond and Park1999; Engel, Hayes, and Wang 2003; Rennie 2006; Karuna 2007), capital struc-ture (Lang and Stulz 1992; Kale and Shahrur 2007), corporate disclosure policy

We appreciate helpful comments from two anonymous referees, Michelle Sovinsky Goeree, Eric Kelley,Chris Lamoureux, Bill Maxwell, Hernan Ortiz-Molina, David Robinson, Janet Smith, Matthew Spiegel, MarkTrombley, and Harold Zhang. We also thank Ed Altman for providing us with the Altman-NYU SalomonCenter Bankruptcy list. Send correspondence to Sandy Klasa, Department of Finance, Eller College of Manage-ment, University of Arizona, Tucson, AZ 85721-0108; telephone: (520) 621-8761; fax: (520) 621-1261. E-mail:[email protected].

C The Author 2008. Published by Oxford University Press on behalf of The Society for Financial Studies.All rights reserved. For Permissions, please e-mail: [email protected]:10.1093/rfs/hhn103 Advance Access publication December 23, 2008

at Univ of Southern California on A

pril 17, 2014http://rfs.oxfordjournals.org/

Dow

nloaded from

The Review of Financial Studies / v 22 n 10 2009

(Harris 1998; Botosan and Harris 2000; Botosan and Stanford 2005; Rogersand Stocken 2005; Verrecchia and Weber 2006), income-increasing accountingchoices (Zmijewski and Hagerman 1981), and the determinants of corporateearnings (Cheng 2005).

We consider the empirical implications of using industry concentrationmeasures that are based on only a firms publicly traded rivals. To do so,we compare Compustat-based industry concentration measures with indus-try concentration measures collected from 19632002 Census of Manufac-tures publications provided by the U.S. Census Bureau, which are basedon all public and private firms in an industry. The Census of Manufac-tures data have also been used to examine the effect of product marketfactors on a wide spectrum of finance issues: corporate takeover decisions(Eckbo 1985, 1992; Maksimovic and Phillips 2001), capital structure de-cisions (Phillips 1995; Kovenock and Phillips 1997; Mackay and Phillips2005; Campello 2006), corporate investment patterns (Akdogu and Mackay2008), chief executive officer (CEO) compensation contracts (Aggarwaland Samwick 1999), and risk management decisions (Haushalter, Klasa, andMaxwell 2007). These studies argue that it is preferable to use concentrationmeasures calculated by the U.S. Census because the measures based on Com-pustat data are subject to measurement error due to the exclusion of privatefirms, which often account for a nonnegligible percentage of industry sales.MacKay and Phillips (2005, p. 1439) point out that because industry concen-tration measures calculated by the U.S. Census are used by regulatory agenciessuch as the Department of Justice, these measures are likely to be the mostappropriate to study product market issues.

Our empirical evidence indicates that Compustat-based industry concen-tration measures are poor proxies for actual industry concentration. The cor-relation between the Compustat and U.S. Census-based Herfindahl indexesis only 13%. Moreover, U.S. Census-based concentration measures are pos-itively related to industry price-cost margins and to firm size measures suchas net sales, total assets, and market capitalization. However, these relationsare not obtained using Compustat-based industry concentration measures. Fur-ther, we show that the total number of private and public firms in an industrymarkedly drops between the highest and lowest quintiles of U.S. Census-basedindustry concentration measures. In contrast, this number changes very littleif Compustat-based industry concentration measures are used instead. Thus,only when U.S. Census-based industry concentration measures are used are theresults consistent with theoretical predictions that more-concentrated industriesthat should be more oligopolistic are populated by fewer and larger firms thatenjoy higher price-cost margins due to their greater market power.

Next, we use the U.S. Census data to reexamine several important resultsobtained in prior studies that use Compustat-based industry concentration mea-sures. First, we consider the Hou and Robinson (2006) finding that firms inmore-concentrated industries earn lower future stock returns. They argue that

3840



Dow

nloaded from

The Limitations of Industry Concentration Measures Constructed with Compustat Data

their results indicate that barriers to entry in highly concentrated industriesinsulate firms from undiversifiable distress risk, which is priced in equity re-turns. Hou and Robinson (2006) also report that firms in less-concentratedindustries spend more on research and development. They contend that thisresult supports the Schumpeter (1912) proposition that innovation, which is aform of creative destruction, is more likely to occur in competitive industries.1Also, they posit that higher innovation risk in less-concentrated industries con-tributes to the higher cost of equity capital in such industries. In contrast, wedocument that industry concentration measures calculated by the U.S. Censusare not related to future stock returns. Further, we find that the U.S. Censusmeasures are positively rather than negatively associated with research and de-velopment expenses. These differences in results are not driven by our samplebeing confined to the manufacturing sector, because using Compustat-basedindustry concentration measures and limiting our analysis to this sector, we areable to replicate the Hou and Robinson (2006) findings.

Second, we reexamine the Lang and Stulz (1992) result that the effect ofbankruptcy announcements on the equity values of competitors is more positivein more-concentrated industries and that this effect is amplified in industrieswith low leverage. Lang and Stulz (1992) argue that in industries that are moreconcentrated and have lower leverage, competitors are more likely to benefitfrom the difficulties faced by a bankrupt firm. We obtain the same results asin Lang and Stulz (1992) with our sample of manufacturing firms, when weuse Compustat-based industry concentration measures. However, using U.S.Census-based industry concentration measures, we are unable to replicate theLang and Stulz (1992) findings.

Third, we reexamine the Harris (1998) result that firms are less likely toprovide segment disclosures for operations in more-concentrated industries,measured using Compustat data. She argues that to protect their abnormalprofits and market share, firms in less-competitive industries are less likelyto disclose commercially valuable information to competitors. We obtain thesame results as in Harris (1998) with our sample of manufacturing firms, whenwe use Compustat-based industry concentration measures. However, we findthat the decision to provide segment disclosures for operations in a particularindustry is not associated with the U.S. Census-based industry concentrationmeasures of that industry.

Finally, we consider the Defond and Park (1999) result that CEO turnover isnegatively associated with Compustat-based industry concentration measures.They argue that in more-competitive industries in which there is greater ho-mogeneity across firms and in which CEOs are likely to have more peers, itis easier to identify and replace poorly performing CEOs. We obtain results

1 In contrast to his earlier prediction, Schumpeter (1942) claims that there is more innovation in less-competitiveindustries because firms in such industries can enjoy economic profits resulting from their innovation, instead ofhaving these profits competed away.

3841



Dow

nloaded from


similar to those in Defond and Park (1999) with our sample of manufacturingfirms when we use Compustat-based industry concentration measures. How-ever, we find an insignificant relationship between CEO turnover and U.S.Census-based industry concentration measures.

Our finding that using U.S. Census-based industry concentration measureswe are unable to replicate the Hou and Robinson (2006); Lang and Stulz (1992);Harris (1998); and Defond and Park (1999) results suggests that Compustat-based industry concentration measures capture other industry characteristicsthat happen to be correlated with the dependent variables of these studies. Toprovide evidence on this issue, we examine what drives the Hou and Robinson(2006) result that Compustat-based industry concentration measures are neg-atively related to future stock returns and the Harris (1998) finding that firmsare less likely to provide segment disclosures for operations in industries withhigher values for Compustat-based measures.

We find that Compustat-based industry concentration measures are signifi-cantly negatively related to the change in industry shipments reported by theCensus of Manufactures during the prior five years. However, U.S. Census-based industry concentration measures are not related to past shipment growth.Thus, for some reason other than the actual concentration of an industry, in-dustries with high Compustat-based measures experience poor growth in therecent past. An explanation for these findings is that a declining industry isleft with only a few large, public firms relative to private firms. Consequently,there are only a few companies in the Compustat database for the industry,and this results in high Compustat-based industry concentration values. Consist-ent with this explanation, we find a significant negative relationship betweenthe Compustatbased industry concentration measures and the change overthe prior five years in the number of firms in an industry included in both theCenter for Research in Security Prices (CRSP) and Compustat databases.2

Our finding that industries with high Compustat-based industry concentra-tion measures tend to be declining industries explains why these industriesspend less on research and development, as reported in Hou and Robinson(2006) and confirmed in our study. Given that prior work suggests that firmsthat spend more on research and development have higher future stock re-turns (e.g., Chan, Lakonishok, and Sougiannis 2001; Chambers, Jennings, andThompson 2002; Eberhart, Maxwell, and Siddique 2004), we examine whetherthe association between Compustat-based industry concentration measures andfuture stock returns is sensitive to controlling for research and development ex-penses. After controlling for current research and development expenses, whichwe find are positively related to future stock returns, the negative associationbetween Compustat-based industry concentration measures and future stock

2 To construct Compustat-based industry concentration measures, studies such as that by Hou and Robinson (2006)often require that firms are included on both CRSP and Compustat.

3842



Dow

nloaded from


returns becomes insignificant. This finding suggests that the negative rela-tionship between research and development expenses and Compustat-basedindustry concentration measures drives the negative association between thesemeasures and future stock returns.

Next, we show that after controlling for prior growth in industry shipments,the Harris (1998) documented negative relationship between a firms decision toprovide segment disclosures of its operations in an industry and the Compustat-based concentration measures of that industry becomes insignificant. Further,we find that a firms decision to provide segment disclosures for its operationsin an industry is positively related to that industrys prior shipment growth.The latter result is consistent with the Miller (2002) and Kothari, Shu, andWysocki (forthcoming) evidence that firms with weak (strong) prior operatingperformance provide less (more) informative disclosures. Overall, these find-ings suggest that the Harris (1998) result is driven by Compustat-based industryconcentration measures proxying for the prior performance of a firm in one ofits segments that in turn determines the firms decision to provide a separatedisclosure for that segment.

Our study makes the following contributions. First, we document thatCompustat-based industry concentration measures, which exclude data on pri-vate firms, are poor measures of actual industry concentration. Second, weshow that researchers who use Compustat data to construct industry concen-tration measures can arrive at results that lead to incorrect conclusions. Finally,our findings suggest that the significant results obtained in prior studies thatuse Compustat-based industry concentration measures could be due to thesemeasures proxying for other industry characteristics that are correlated withthe dependent variables of these studies.

The remainder of this article is organized as follows. Section 1 describes theCompustat- and U.S. Census-based industry concentration measures used in thestudy. Section 2 provides evidence that indicates that Compustat-based industryconcentration measures are poor proxies for actual industry concentration.Section 3 reexamines results of four prior studies that use Compustat-basedindustry concentration measures. Section 4 concludes.

1. Description of Compustat- and U.S. Census-based industry concentrationmeasures

1.1 Compustat-based industry concentration measuresCompustat-based industry concentration measures are calculated using thesales data of firms included in the Compustat database. Because Compustatexcludes private firms, Compustat-based industry concentration measures canpotentially provide an inaccurate picture of the actual degree of concentrationin an industry. In particular, in industries in which private firms account fora nonnegligible percentage of industry sales, it is problematic to rely on data

3843



Dow

nloaded from


that exclude these firms (Hay and Morris 1991, p. 210). However, there aretwo advantages of using Compustat data to construct industry constructionmeasures. First, such measures can easily be calculated by extracting fromCompustat total sales for each firm in a particular industry. Therefore, they canprovide a long and continuous time series of concentration measures. Second,using the Compustat database to calculate industry concentration measuresallows researchers to construct these measures for a wide spectrum of industries.

The Compustat-based Herfindahl index is calculated by adding the squaresof the sales market shares of all the firms in an industry that have sales dataon Compustat. Similarly, the Compustat-based four-firm ratio is calculated byadding the sales market shares of the four largest firms in an industry in terms ofmarket share. We refer to the Compustat-based Herfindahl index and four-firmratio as HI-Compustat and FFR-Compustat.

For the univariate results presented in this and the second section of thearticle, HI-Compustat and FFR-Compustat are calculated in a manner similarto that in Hou and Robinson (2006). HI-Compustat and FFR-Compustat arecalculated using the sales market shares of all the firms in an industry withsales data on Compustat and are averaged over the past three years. Industry isdefined using historical CRSP Standard Industrial Classification (SIC) codes.3,4For the reexamination of the Hou and Robinson (2006); Lang and Stulz (1992);Harris (1998); and Defond and Park (1999) results, we calculate Compustat-based industry concentration measures using the methodology employed bythe specific study.

1.2 U.S. Census-based industry concentration measuresThe Census of Manufactures publications provided by the U.S. Census Bureaureport concentration ratios for hundreds of industries in the manufacturingsector. We hand-collect data on the U.S. Census-based Herfindahl index andfour-firm concentration ratio from Census of Manufactures publications forthe years 1963, 1966, 1967, 1970, 1972, 1977, 1982, 1987, 1992, 1997, and2002. The data are for four-digit SIC industries (SIC codes between 2000and 3999) for the years 19631992 and for six-digit North American IndustryClassification System (NAICS) industries (NAICS codes between 311111 and339999) for the years 1997 and 2002.5

3 Kahle and Walkling (1996) report that over long sample periods there are advantages to using historical CRSPSIC codes instead of Compustat SIC codes. Further, because over the past fifty years the U.S. Census Bureau hasrevised the SIC system a number of times, it is advantageous to use historical CRSP SIC codes when constructingCompustat-based industry concentration measures.

4 Most work that uses Compustat- or U.S. Census-based industry concentration measures assumes that a firmcompetes in only the industry represented by the industry classification code assigned to the firm. Because theaim of this article is to compare results obtained with these two types of industry concentration measures, wemake the same assumption.

5 For nonmanufacturing industries, the alternative to using Compustat data and consequently excluding data onprivate firms to construct industry concentration measures is to use concentration measures collected fromindustry specific publications. Work that uses such measures examines issues such as the interaction between

3844



Dow

nloaded from


Unlike Compustat-based industry concentration measures, U.S. Census-based measures are constructed using data from all public and private firmsin an industry and hence should better capture actual industry concentration.The use of U.S. Census-based measures by government regulatory agenciessuggests that these measures should be quite reliable. For instance, these mea-sures are often used by the Federal Trade Commission when it decides whetherto challenge mergers on antitrust grounds. Another factor that suggests theU.S. Census-based industry concentration measures should be reliable is thatall firms in the United States are required by federal law to respond to U.S.Census surveys (under Title 13 of the U.S. code). Further, Sections 213 and224 of Title 13 of the U.S. code state that employees of the U.S. Census whocollect data on its behalf and who knowingly furnish false information aresubject to imprisonment and that agents of companies who willfully providefalse answers to questions about their company are subject to hefty fines.

The Census of Manufactures calculates the Herfindahl index of an industryas the sum of the squares of the individual company market shares of allthe companies in an industry or the fifty largest companies in the industry,whichever is lower. The four-firm ratio of an industry is the sum of the marketshares of the four largest firms in the industry in terms of market share. Werefer to these measures as HI-Census and FFR-Census, respectively.

The Census of Manufactures is published only during years when a U.S.Census takes place. We use the U.S. Census data for a given year as a proxyfor industry concentration not only for that year but also for the one ortwo years immediately before and after it. This approach is similar to thatused in several prior studies (e.g., Aggarwal and Samwick 1999; MacKay andPhillips 2005; Campello 2006; Haushalter, Klasa, and Maxwell 2007). Table1 provides information on the time periods to which we apply a given yearsU.S. Census data. For example, we use the 1992 Census of Manufactures dataas a proxy for industry concentration for the period 19901994. Data for FFR-Census are available in all Census of Manufactures publications, resulting in asample period of 19632005. However, data for HI-Census are available onlyfrom Census of Manufactures publications from 1982 on, resulting in a sampleperiod of 19802005.

1.3 Descriptive statistics of U.S. Census- and Compustat-based industryconcentration measures

Table 2 provides descriptive statistics of the industry concentration measures.Panel A compares HI-Census and HI-Compustat for four-digit SIC industries

industry concentration (collected from the annual publication Supermarket News Distribution Study of GroceryStore Sales) and capital structure decisions in the supermarket industry (Chevalier 1995a, 1995b; Chevalierand Scharfstein 1996), the relationship between industry concentration (collected from the American TruckingAssociation) and firm survival after deregulation of the trucking industry (Zingales 1998), and how industryconcentration (collected from Discount Merchandiser) interacts with ownership structure, capital structure, andcorporate focus in the discount department industries (Khanna and Tice 2000).

3845



Dow

nloaded from


Table 1Sample periods to which a particular years Census of Manufactures data on industry concentration areapplied

Years for which the Census ofManufactures reports data

Sample periods to which Census of Manufacturesdata on industry concentration are applied

1963 196319641966 196519661967 196719681970 196919701972 197119741977 197519791982 198019841987 198519891992 199019941997 199519992002 20002005

The Census of Manufactures is published by the U.S. Census Bureau.

Table 2Descriptive statistics of industry concentration measures

Mean Median STD 20% 40% 60% 80%

Panel A: 19802005 sample period

Industry at four-digit SIC levelHI-Census 0.064 0.043 0.062 0.015 0.032 0.058 0.104HI-Compustat 0.696 0.714 0.278 0.410 0.596 0.857 1.000

Panel B: 19632005 sample period

Industry at four-digit SIC levelFFR-Census 0.382 0.350 0.208 0.195 0.290 0.411 0.560FFR-Compustat 0.969 1.000 0.079 0.970 1.000 1.000 1.000

Panel C: 19952005 sample period

Industry at four-digit SIC levelHI-Census 0.061 0.040 0.063 0.012 0.027 0.053 0.103FFR-Census 0.368 0.327 0.217 0.159 0.272 0.402 0.559

HI-Census is the Herfindahl index for four-digit SIC industries as reported by the Census of Manufactures. The1997 and 2002 U.S. Censuses define industry at the six-digit NAICS level. Over the 19952005 period, we useHI-Census values for six-digit NAICS industries to calculate HI-Census values for four-digit SIC industries byweighting the HI-Census values of component six-digit NAICS industries by the square of their share of thebroader four-digit SIC industry. To determine what are the component six-digit NAICS industries of a broaderfour-digit SIC industry, we use NAICS correspondence tables provided by the U.S. Census. HI-Compustat is thesum of the squares of the sales market shares of all firms in a CRSP four-digit SIC industry that have sales dataon Compustat. For each year t, HI-Compustat is averaged over a three-year period, from year t 2 to year t.FFR-Census is the sum of the market shares of the four largest firms in terms of market share in a four-digitSIC industry as defined by the Census of Manufactures. Over the 19952005 period, FFR-Census values forsix-digit NAICS industries are used to approximate FFR-Census for broader four-digit SIC industries by firstdetermining which component six-digit NAICS industry of a broader four-digit SIC industry has the largest salesas measured by the sales of its top four firms. Next, we divide the sales of the top four firms of this six-digitNAICS industry by the total sales of the firms in all the component six-digit NAICS industries within the broaderfour-digit SIC industry. FFR-Compustat is the sum of the market shares of the four largest firms in terms ofmarket share in a CRSP four-digit SIC industry. A firms market share is measured as sales divided by total salesof all CRSP firms in that industry that have sales data on Compustat. For each year t, FFR-Compustat is averagedover a three-year period from year t 2 to year t. Descriptive statistics for industry concentration measuresare calculated by pooling Compustat- and U.S. Census-based industry-year observations. Observations used tocalculate HI-Census or HI-Compustat are taken from all years within specific sample periods.

3846



Dow

nloaded from


over the 19802005 period. The difference between these two measures is strik-ing. The mean values of HI-Census and HI-Compustat are 0.064 and 0.696,respectively. Panel B compares FFR-Census and FFR-Compustat for four-digitSIC industries over the 19632005 period. The mean values of FFR-Censusand FFR-Compustat are 0.382 and 0.969, respectively. The large differencesbetween the values of the U.S. Census- and Compustat-based concentrationmeasures indicate that on average private firms account for a significant per-centage of industry sales and that it is therefore problematic to exclude dataon these firms when calculating industry concentration measures. The fortiethpercentile value for FFR-Compustat is 1.000, showing that the majority offour-digit SIC industries have four or fewer firms with sales data available onCompustat.

As shown in Table 1, for the period 19952005, we use concentration ratiosfrom the 1997 and 2002 Census of Manufactures publications in which indus-try is defined using six-digit NAICS codes. HI-Census for six-digit NAICSindustries and the total shipments for these industries reported in the Censusof Manufactures can be used to calculate HI-Census for broader four-digitSIC industries. We do this by weighting HI-Census of the component six-digitNAICS industries by the square of their share of the shipments of the broaderfour-digit SIC industry.6,7 To calculate FFR-Census for four-digit SIC indus-tries using FFR-Census of component six-digit NAICS industries, we use anapproximation method. We first determine the component six-digit NAICS in-dustry of a broader four-digit SIC industry that has the largest value for thesales of its top four firms. Next, we divide the sales of the top four firms ofthis six-digit NAICS industry by the total sales of all the component six-digitNAICS industries within the broader four-digit SIC industry.8

Panel C of Table 2 provides descriptive statistics for the 19952005 periodfor HI-Census and for FFR-Census at the four-digit SIC level. The statisticsfor these measures are very similar to those reported in panels A and B. Theseresults suggest that our methods for converting the six-digit NAICS level con-centration measures to four-digit SIC level measures are reasonable.

6 For instance, if there are J six-digit NAICS industries that belong to one four-digit SIC industry, the six-digitNAICS Herfindahl index values are weighed by the square of the shares of each component six-digit NAICSindustry. Thus, if the broader four-digit SIC industry is called p, then the Herfindahl index of industry p,

H H Ip =J

j=12j H H I j , where j =

total shipments of component 6 digit NAICS industryjtotal shipments of broader 4 digit SIC industryp .

7 To determine what are the component six-digit NAICS industries of a broader four-digit SIC industry, we useNAICS correspondence tables provided by the U.S. Census.

8 We refer to this method as an approximation method because it is possible that a component six-digit NAICSindustry that does not have the largest sales as measured by the sales of its top four firms has a firm whose salesis greater than at least one of the top four firms of the component six-digit NAICS industry with the largest valuefor the sales of its top four firms.

3847



Dow

nloaded from


2. Evidence that Compustat-based industry concentration measures are poorproxies for actual industry concentration

Table 3 reports firm and industry characteristics for quintiles sorted by HI-Census and HI-Compustat. The panel A quintiles are based on HI-Census. Themedian value of HI-Census for quintile 1 is 0.009 and for quintile 5 is 0.153,about fifteen times larger. However, the corresponding values of HI-Compustatare quite similar, 0.659 and 0.891, respectively. The panel B quintiles are basedon HI-Compustat. The median value of HI-Compustat for quintile 1 is 0.311 andfor quintile 5 is 1.000, about three times larger. However, the correspondingvalues of HI-Census are quite similar: 0.041 and 0.059, respectively. Theseresults suggest that the exclusion of data on private firms not only leads tolarge differences between Compustat- and U.S. Census-based concentrationmeasures but also leads to a low correlation between the two types of measures.

Next, we determine the relation of industry markups with HI-Census andHI-Compustat. Industry markups represent average price-cost margins in anindustry. Industrial organization theory predicts that in more-concentrated in-dustries there is less intense competition and price is consequently set furtheraway from marginal cost. Thus, a positive relation is expected between industryconcentration and price-cost margins. We follow Allayannis and Ihrig (2001)and calculate industry markups using aggregate industry-level data from An-nual Survey of Manufacturers publications. We also use their definition forindustry markups, which is as follows:

Industry Markup =(Value of Sales + Inventories Payroll Cost of Materials)

(Value of Sales + Inventories) .

We collect annual industry data at the four-digit SIC level from 1993, 1994,1995, and 1996 Annual Survey of Manufacturers publications and calculateindustry markups for the period from 1993 to 1996. For this period, we formquintiles based on HI-Census and HI-Compustat and calculate median industrymarkups for each of the quintiles.

The results in panel A of Table 3 show that industry markups are higher inindustries with higher values of HI-Census. In industries that are in the highestquintile of HI-Census, industry markups are almost 25% larger than they arein industries in the lowest quintile of HI-Census. In contrast, panel B showsthat industry markups are not systematically related to HI-Compustat. Theseresults suggest that U.S. Census-based measures are better proxies for actualindustry concentration than are Compustat-based measures.

Next, we compute for each quintile the median number of public and privatefirms per industry, based on U.S. Census data, Nfirms-Census. Panel A reportsthat for HI-Census-based quintiles 1 and 5, the median Nfirms-Census valuesare 1385 and 88, respectively. Panel B shows that for HI-Compustat-based

3848



Dow

nloaded from

TheLim

itationsofIndustry

ConcentrationM

easuresConstructedw

ithCom

pustatData

Table 3Firm characteristics of portfolios sorted by industry concentration measures

HI-Census HI-Compustat Industry Markup Nfirms-Census Nfirms-Compustat Nfirms-Compustat as apercentage ofNfirms-Census

Net Sales Market Capitalization Book Assets

Panel A: Quintiles based on HI-Census1 0.009 0.659 0.299 1385 2 0.19% 217 94 1522 0.025 0.680 0.314 518 3 0.56% 304 141 2643 0.045 0.590 0.315 356 4 0.82% 298 181 2534 0.079 0.744 0.321 164 2 1.46% 379 188 3185 0.153 0.891 0.369 88 2 2.20% 911 651 691

Panel B: Quintiles based on HI-Compustat1 0.041 0.311 0.319 535 9 1.79% 388 200 3282 0.042 0.516 0.313 423 4 1.09% 274 138 2063 0.044 0.706 0.305 413 3 0.78% 306 178 2724 0.046 0.929 0.351 287 2 0.69% 470 225 3875 0.059 1.000 0.328 211 1 0.55% 304 187 219

The sample period is 19802005. Median values are reported. HI-Census is the Herfindahl index for four-digit SIC industries as reported by the Census of Manufactures. The 1997 and2002 U.S. Censuses define industry at the six-digit NAICS level. Over the 19952005 period, we use HI-Census values for six-digit NAICS industries to calculate HI-Census values forfour-digit SIC industries by weighting the HI-Census values of component six-digit NAICS industries by the square of their share of the broader four-digit SIC industry. To determine whatare the component six-digit NAICS industries of a broader four-digit SIC industry we use NAICS correspondence tables provided by the U.S. Census. HI-Compustat is the sum of thesquares of the sales market shares of all firms in a CRSP four-digit SIC industry that have sales data on Compustat. For each year t, HI-Compustat is averaged over a three-year period fromyear t 2 to year t. Industry Markup is calculated using data collected from 1993, 1994, 1995, and 1996 Annual Survey of Manufactures publications and is calculated as (Value of Sales + Inventories Payroll Cost of Materials) / (Value of Sales + Inventories). For the analysis of Industry Markup we form HI-Census and HI-Compustat based quintiles over the19931996 period. Nfirms-Census is the number of firms per four-digit SIC industry as reported by the Census of Manufactures. Over the 19952005 period we use Nfirms-Census valuesfor six-digit NAICS industries to calculate Nfirms-Census values for four-digit SIC industries by summing the Nfirms-Census values of component six-digit NAICS industries in a broaderfour-digit SIC industry. Nfirms-Compustat is the total number of CRSP firms in a four-digit SIC industry that are included on Compustat. Net Sales is defined as net sales in millions in year t.Market Capitalization is defined as market value of equity in millions at the end of year t. Book Assets is the book value of total assets at the end of year t. Net Sales, Market Capitalization,and Book Assets are inflation adjusted. Descriptive statistics are calculated by pooling firm-year observations.3849

at Univ of Southern California on April 17, 2014 http://rfs.oxfordjournals.org/ Downloaded from


quintiles 1 and 5, the median Nfirms-Census values are 535 and 211, respec-tively. Thus, the difference in Nfirms-Census between quintiles 1 and 5 is fargreater when the quintiles are based on HI-Census than when they are basedon HI-Compustat. Given that more-concentrated industries, which are presum-ably less competitive, should be populated with a smaller number of firms,the above results further suggest that HI-Census is a better indicator of trueindustry concentration than is HI-Compustat.

Table 3 also reports for each of the quintiles the median value of Nfirms-Compustat, defined as the number of firms per industry that have data availableon CRSP and Compustat. This number represents the number of firms per in-dustry used to calculate Compustat-based industry concentration measures. Inall cases, median Nfirms-Compustat is markedly lower than is median Nfirms-Census. For example, for quintile 1 of panel A, median Nfirms-Census is 1385,but median Nfirms-Compustat is only 2. This finding suggests that an importantcontributing factor to why HI-Compustat could be a poor indicator of industryconcentration is that this measure is based on partial data. Furthermore, medianNfirms-Compustat does not vary systematically across the HI-Census quintiles,suggesting that there is no relationship between actual industry concentrationand Nfirms-Compustat. However, panel B shows that median Nfirms-Compustatmarkedly decreases from HI-Compustat-based quintiles 15. This result indi-cates that Compustat-based industry concentration measures are more likelyto proxy for the number of firms in an industry that are covered by CRSP andCompustat than proxy for true industry concentration.

Additionally, Table 3 presents for each of the quintiles the median percentageof firms in an industry reported by the U.S. Census that are included on CRSPand Compustat (Nfirms-Compustat as a percentage of Nfirms-Census). PanelA shows that this percentage increases substantially from HI-Census-basedquintiles 15. This is consistent with the expectation that in more-concentratedindustries there should be a greater percentage of large, public firms, the typethat are likely to be included on both CRSP and Compustat. Panel B, in contrast,shows that the percentage of U.S. Census firms that are included on CRSP andCompustat does not increase from HI-Compustat-based quintiles 15. Instead,this percentage decreases over these quintiles. This finding provides furthersupport to the notion that Compustat-based industry concentration measuresare poor indicators of actual industry concentration.

Next, we examine how firm size varies across quintiles sorted by HI-Censusand by HI-Compustat. We consider three measures of size: net sales, marketcapitalization, and book assets. Each of these measures is inflation adjusted.Data for these variables are obtained from Compustat. Thus, the median valuesof each of the firm size variables reported in Table 3 are not the medians acrossall U.S. Census firms but are the medians for the subset of firms covered bythe CRSP and Compustat databases. Consequently, inferences based on thesevariables should be viewed with caution. Panel A shows that median net sales(market capitalization, book assets) for firms in the HI-Census-based quintiles

3850



Dow

nloaded from


Table 4Correlations between industry concentration measures and industry markup

HI-Census HI-Compustat Industry Markup

HI-Census 0.129 0.220(0.000) (0.000)

HI-Compustat 0.111 0.016(0.000) (0.586)

Industry Markup 0.183 0.019(0.000) (0.519)

The sample period is 19802005. This table presents Pearson (above the diagonal) and Spearman (below thediagonal) correlations among selected variables. Numbers in parentheses are significance levels. HI-Censusis the Herfindahl index for four-digit SIC industries as reported by the Census of Manufactures. The 1997and 2002 U.S. Censuses define industry at the six-digit NAICS level. Over the 19952005 period we useHI-Census values for six-digit NAICS industries to calculate HI-Census values for four-digit SIC indus-tries by weighting the HI-Census values of component six-digit NAICS industries by the square of theirshare of the broader four-digit SIC industry. To determine what are the component six-digit NAICS in-dustries of a broader four-digit SIC industry we use NAICS correspondence tables provided by the U.S.Census. HI-Compustat is the sum of the squares of the sales market shares of all firms in a CRSP four-digit SIC industry that have sales data on Compustat. For each year t, HI-Compustat is averaged over athree-year period from year t 2 to year t. Industry Markup is calculated using data collected from 1993,1994, 1995, and 1996 Annual Survey of Manufactures publications and is calculated as (Value of Sales + Inventories Payroll Cost of Materials) / (Value of Sales + Inventories). Correlations involving IndustryMarkup are calculated over the 19931996 period.

1 and 5 are $217m ($94m, $152m) and $911m ($651m, $691m), respectively.These results indicate that, consistent with theoretical expectations, in less-concentrated industries, which are likely to be more competitive, firm size issubstantially smaller than it is in highly concentrated less-competitive indus-tries. Panel B shows that median net sales (market capitalization, book assets)for firms in HI-Compustat-based quintiles 1 and 5 are $388m ($200m, $328m)and $304m ($187m, $219m), respectively. These findings show that firm sizeis actually smaller in the highest HI-Compustat quintile than it is in the lowestquintile, further suggesting that the Compustat-based industry concentrationmeasures are poor proxies for actual industry concentration.

Table 4 reports correlations between HI-Census, HI-Compustat, and IndustryMarkup. The table first confirms the result in Table 3 that the correlationbetween HI-Census and HI-Compustat is low. Specifically, the Spearman andPearson correlations between the two variables are only 0.111 and 0.129.9The table also confirms the results in Table 3 that the correlation between HI-Census and Industry Markup is positive and that HI-Compustat and IndustryMarkup are not systematically related. These results are further evidence that

9 We find that the Spearman and Pearson correlations between HI-Census and HI-Compustat for the period from1980 to 1994 are 0.126 and 0.135, respectively. These results rule out the possibility that the low correlationbetween HI-Census and HI-Compustat reported in Table 4 is due to the fact that for the period from 1995 to2005, we convert HI-Census values for six-digit NAICS industries into values for four-digit SIC industries.

3851



Dow

nloaded from


Table 5Industries sorted by U.S. Census based Herfindahl index valuesTwo-digit SIC code Industry name HI-Census HI-Compustat

24 Lumber & Wood Products 0.027 0.69123 Apparel 0.029 0.73134 Fabricated Metal Products 0.037 0.72425 Furniture and Fixtures 0.039 0.69739 Miscellaneous Manufacturing 0.048 0.75327 Printing and Publishing 0.053 0.61735 Machinery and Computer Equipment 0.054 0.64830 Rubber and Plastics 0.054 0.59629 Petroleum Refining 0.058 0.63638 Measuring Instruments 0.059 0.59026 Paper 0.060 0.70033 Primary Metal 0.063 0.65922 Textile Product Mills 0.063 0.78736 Electronic Equipment 0.078 0.70920 Food and Kindred Products 0.078 0.76332 Stone, Clay, Glass, and Concrete 0.087 0.79328 Chemicals 0.091 0.63031 Leather 0.092 0.73837 Transportation Equipment 0.096 0.66921 Tobacco 0.153 0.837

This table reports mean values of HI-Census and HI-Compustat for four-digit SIC industries within a particulartwo-digit SIC industry. The industries are listed in ascending order of HI-Census. HI-Census is the Herfindahlindex for four-digit SIC industries as reported by the Census of Manufactures. The 1997 and 2002 U.S. Censusesdefine industry at the six-digit NAICS level, and consequently for these two years we use HI-Census valuesfor six-digit NAICS industries to calculate HI-Census values for four-digit SIC industries by weighting the HI-Census values of component six-digit NAICS industries by the square of their share of the broader four-digit SICindustry. To determine what are the component six-digit NAICS industries of a broader four-digit SIC industrywe use NAICS correspondence tables provided by the U.S. Census. HI-Compustat is the sum of the squares ofthe sales market shares of all firms in a CRSP four-digit SIC industry that have sales data on Compustat. Foreach year t, HI-Compustat is averaged over a three-year period from year t 2 to year t. The observations usedto calculate mean HI-Census or HI-Compustat for two-digit SIC industries are taken only from years when aU.S. Census takes place (1982, 1987, 1992, 1997, and 2002).

Compustat-based industry concentration measures are not good proxies foractual industry concentration.10

In Table 5, we report average values of HI-Census and HI-Compustat forfour-digit SIC industries within particular two-digit SIC industry groups. Thesedata allow us to provide information on what the typical Herfindahl index valueis for a four-digit SIC industry within a broader two-digit SIC industry. Also,this way of reporting our findings makes it easier to comprehend the data giventhe large number of four-digit SIC industries within the manufacturing sector.The industries are listed in ascending order of HI-Census. There is a largedifference between HI-Census and HI-Compustat for every industry. Further,our results show that although HI-Census increases significantly as one moves

10 We also examine the correlation between industry markups and the four-firm ratios. We find that the Pearsonand Spearman correlations between Industry Markup and FFR-Census are positive and significant and equal0.198 and 0.179, respectively. However, the Spearman and Pearson correlations between Industry Markup andFFR-Compustat are negative and significant and equal 0.082 and 0.123, respectively.

3852



Dow

nloaded from


from less- to more-concentrated industries based on this measure, there is nosystematic variation in HI-Compustat along these industries, reflecting the lowcorrelation between the two measures.

3. Reexamination of results obtained in prior studies that useCompustat-based industry concentration measures

Results that we have presented so far suggest that U.S. Census-based industryconcentration measures are superior to Compustat-based industry concentra-tion measures in measuring actual industry concentration and that there is avery low correlation between the two measures. It is therefore important to ex-amine whether the results of prior empirical studies that use Compustat-basedindustry concentration measures are robust to the use of U.S. Census-based in-dustry concentration measures. We reexamine the results in Hou and Robinson(2006); Lang and Stulz (1992); Harris (1998); and Defond and Park (1999). Wereexamine these four papers for the following reasons. First, we are able to col-lect the necessary data to replicate these studies. Second, the issues addressed inthese studies are from four different finance-related areas: asset pricing, capitalstructure, corporate disclosure policy, and corporate governance.

3.1 Hou and Robinson (2006)Hou and Robinson (2006) find that firms in more-concentrated industries earnlower future stock returns. They argue that barriers to entry in highly con-centrated industries insulate firms from undiversifiable distress risk, which ispriced in equity returns. We examine the sensitivity of their results to usingU.S. Census-based industry concentration measures in place of the Compustatmeasures.

Following Hou and Robinson (2006), we estimate firm-level Fama-MacBethcross-sectional regressions of the model in their panel B of Table 4. Specifically,we regress future monthly stock returns from July of year t + 1 to June of yeart + 2 on HI-Compustat and other firm characteristics. As did Hou and Robinson(2006), we calculate HI-Compustat at the three-digit SIC level and measure itas the mean value of HI-Compustat over the three prior years, t 2, t 1,and t. Hou and Robinson (2006) use the sample period from 1963 to 2001.Because data on HI-Census from the Census of Manufactures are availablestarting in 1982 and, as shown in Table 1, the earliest year to which we canapply these data is 1980, we study the period from 1980 to 2001. Also, dataon HI-Census are available only for manufacturing firms. So that we can makebetter comparisons, we examine only manufacturing firms when estimatingresults with the HI-Compustat measure.

The regression results in the first column of Table 6 document a significantnegative relationship between HI-Compustat and future stock returns. Thisresult is similar to that reported in Hou and Robinson (2006) and is the basis oftheir conclusion that firms in more-concentrated industries earn lower returns

3853



Dow

nloaded from


Table 6The relationship between stock returns and industry concentration

1 2 3 4

Intercept 1.408 (3.70) 1.518 (4.45) 2.170 (6.67) 1.840 (6.92)HI-Compustat 0.342 (1.93)HI-Census 0.115 (0.24)FFR-Compustat 0.342 (2.00)FFR-Census 0.054 (0.47)Ln(Market

Capitalization)0.014 (0.24) 0.016 (0.29) 0.113 (2.65) 0.111 (2.61)

Ln(B/M) 0.258 (2.57) 0.299 (3.22) 0.220 (2.99) 0.218 (2.95)Momentum 0.660 (3.26) 0.644 (3.59) 0.573 (3.28) 0.570 (3.27)Beta 0.028 (0.16) 0.098 (0.61) 0.128 (0.97) 0.124 (0.94)Leverage 0.025 (0.09) 0.059 (0.23) 0.072 (0.35) 0.053 (0.26)Average adjusted

R-square0.032 0.032 0.049 0.049

Average number ofobservations

991 991 936 936

This table presents results from Fama-MacBeth cross-sectional regressions explaining monthly stock returns,estimated monthly between July 1980 and June 2001 for models 1 and 2 and between July 1963 and June 2001for models 3 and 4. Monthly returns are the one-year-ahead twelve monthly returns from July of year t + 1 toJune of year t + 2. Industry concentration ratios and accounting data are measured at the end of year t. HI-Censusin this table represents the Herfindahl index for three-digit SIC industries calculated using data collected fromCensus of Manufactures publications. Prior to 1997, the U.S. Census defines industry using four-digit SIC codes,while from 1997 onward industry is defined using six-digit NAICS codes. Over the 19801994 period, we useHI-Census values for four-digit SIC industries to calculate HI-Census values for three-digit SIC industries byweighting the HI-Census values of component four-digit SIC industries by the square of their share of the broaderthree-digit SIC industry. Over the 19952001 period, we use HI-Census values for six-digit NAICS industries tocalculate HI-Census values for three-digit SIC industries by weighting the HI-Census values of component six-digit NAICS industries by the square of their share of the broader three-digit SIC industry. To determine what arethe component six-digit NAICS industries of a broader three-digit SIC industry we use NAICS correspondencetables provided by the U.S. Census. HI-Compustat is the sum of the squares of the sales market shares of allfirms in a CRSP three-digit SIC industry that have sales data on Compustat. For each year t, HI-Compustat isaveraged over a three-year period from year t 2 to year t. FFR-Census in this table represents the four-firm ratiofor four-digit SIC industries calculated using data collected from Census of Manufactures publications. Over the19952001 period FFR-Census values for six-digit NAICS industries are used to approximate FFR-Census forbroader four-digit SIC industries by first determining which component six-digit NAICS industry of a broaderfour-digit SIC industry has the largest sales as measured by the sales of its top four firms. Next, we divide thesales of the top four firms of this six-digit NAICS industry by the total sales of the firms in all the componentsix-digit NAICS industries within the broader four-digit SIC industry. FFR-Compustat is the sum of the marketshares of the four largest firms in terms of market share in a CRSP four-digit SIC industry. A firms market shareis measured as sales divided by total sales of all CRSP firms in that industry that have sales data on Compustat.For each year t, FFR-Compustat is averaged over a three-year period from year t 2 to year t. Ln(MarketCapitalization) is defined as the natural logarithm of the market value of equity in millions at the end of year t.Ln(B/M) is defined as the natural logarithm of book value of equity divided by market value of equity at the endof year t. Momentum for each month is prior one-year stock returns. Beta is market model beta estimated usingthe prior thirty-six monthly equally weighted CRSP index returns. Leverage is defined as total debt divided bymarket value of total assets (i.e., market value of equity plus debt) at the end of year t and is trimmed at the 1%level. Time-series average values of the monthly regression coefficients are reported with time-series t-statisticsin parentheses. , , and represent significance at the 1%, 5%, and 10% levels, respectively, for a two-tailedtest.

because these industries are less risky. The second column of the table presentsthe regression results of the same model as in the first column, except that thethree-digit SIC level HI-Compustat is replaced with the three-digit SIC levelHI-Census. We find that future stock returns are not associated with HI-Census.Thus, this result does not support the Hou and Robinson (2006) conclusion thatindustry concentration is related to future stock returns.

3854



Dow

nloaded from


The third and fourth columns of Table 6 provide results of regression modelsthat are similar to the models in the first two columns of this table, except thatthe four-firm ratio is used as the measure of industry concentration rather thanthe Herfindahl index. This analysis provides evidence on the sensitivity of ourconclusions to the use of an alternative definition of industry concentration.Another benefit of this analysis is that we have data on FFR-Census for the19632001 period, which is the sample period used in Hou and Robinson(2006). We use the four-firm ratio at the four-digit SIC level because FFR-Census is reported at the four-digit SIC level and converting this measure tothe three-digit SIC level involves approximation.11 The results show that FFR-Compustat is significantly negatively associated with future stock returns. Incontrast, there is no association between FFR-Census and future stock returns,further suggesting that the Hou and Robinson (2006) conclusion that industryconcentration is related to future stock returns may not be valid.12,13

Hou and Robinson (2006) also document that firms in less-concentratedindustries have higher research and development expenses. They posit that thisresult is consistent with the Schumpeter (1912) proposition that innovation asa form of creative destruction is more likely to occur in competitive industries.Also, they argue that if innovation risk is priced into stock returns and this riskis higher in more-competitive industries, it contributes to the higher cost ofequity capital in such industries.

We examine whether the relationship between industry concentration andresearch and development expenses is sensitive to using Compustat or U.S.Census data to measure industry concentration. The model we estimate isfrom panel B of Table 2 of Hou and Robinson (2006), which relates industryconcentration to certain industry characteristics. The dependent variables forthe models in columns 1 and 2 of Table 7 are HI-Compustat and HI-Census,and the sample period is 19802001.

11 Our results remain qualitatively the same when we use FFR-Compustat for three-digit industries and approximateFFR-Census for three-digit SIC industries, using a methodology similar to that discussed in Section 1.3 of thearticle. Further, if HI-Compustat and HI-Census are defined at the four-digit SIC level instead of the three-digitSIC level, as in the first two columns of Table 6, the results remain qualitatively the same. We also examinewhether our conclusions are sensitive to defining industry at the six-digit NAICS level. Since the U.S. Censusmeasures for six-digit NAICS industries are available for 1997 and 2002 and we can apply these data for theperiod from 1995 to 2001, we use this sample period in our analysis. We continue to find that future stockreturns are significantly negatively associated with HI-Compustat and FFR-Compustat, but are not significantlyassociated with HI-Census and FFR-Census.

12 We also examine the sensitivity of the results to applying a given years U.S. Census industry concentrationdata to the surrounding one or two years. We estimate the models in columns 2 and 4 of Table 6 only for thoseyears to which the U.S. Census data belong and find that the coefficients on HI-Census and FFR-Census remaininsignificant.

13 In footnote 6 of their paper, Hou and Robinson (2006) report that for a smaller sample of observations theirunivariate analysis results suggest that U.S. Census-based industry concentration measures are negatively relatedto future stock returns. However, as we have shown in this article, our finding that these measures are notsignificantly related to future stock returns is quite robust.

3855



Dow

nloaded from


Table 7The relationship between Compustat- and U.S. Census-based industry concentration measures withindustry average research and development expenses and industry averages of other firm characteristics

1 2 3 4Dependent variable HI-Compustat HI-Census FFR-Compustat FFR-Census

Intercept 0.782 (12.03) 0.024 (4.23) 0.980 (48.54) 0.087 (6.25)R&D expense/book

assets1.574 (4.17) 0.269 (4.98) 0.232 (6.28) 0.734 (8.16)

Ln(MarketCapitalization)

0.021 (7.03) 0.016 (15.37) 0.004 (10.36) 0.060 (27.13)

Earnings/book assets 0.607 (2.13) 0.081 (2.69) 0.065 (2.99) 0.124 (1.44)Dividends/book

equity0.544 (1.80) 0.069 (1.09) 0.170 (3.80) 0.355 (3.20)

Ln(B/M) 0.037 (2.23) 0.010 (2.82) 0.004 (1.61) 0.036 (6.62)Beta 0.072 (3.16) 0.011 (3.04) 0.010 (6.79) 0.013 (1.99)Leverage 0.089 (1.75) 0.013 (1.16) 0.019 (3.10) 0.073 (4.27)Average adjusted

R-square0.057 0.134 0.005 0.137

Average number ofobservations

118 118 269 269

Models 1 and 2 present results from Fama-MacBeth cross-sectional regressions estimated using annual datafrom 19802001. Models 3 and 4 present results from Fama-MacBeth cross-sectional regressions estimatedusing annual data from 1963 to 2001. Industry concentration ratios and firm characteristics are measured at theend of year t. HI-Census in this table represents the Herfindahl index for three-digit SIC industries calculatedusing data collected from Census of Manufactures publications. Prior to 1997, the U.S. Census defines industryusing four-digit SIC codes, while from 1997 onward industry is defined using six-digit NAICS codes. Overthe 19801994 period, we use HI-Census values for four-digit SIC industries to calculate HI-Census valuesfor three-digit SIC industries by weighting the HI-Census values of component four-digit SIC industries by thesquare of their share of the broader three-digit SIC industry. Over the 19952001 period, we use HI-Census valuesfor six-digit NAICS industries to calculate HI-Census values for three-digit SIC industries by weighting the HI-Census values of component six-digit NAICS industries by the square of their share of the broader three-digit SICindustry. To determine what are the component six-digit NAICS industries of a broader three-digit SIC industry,we use NAICS correspondence tables provided by the U.S. Census. HI-Compustat is the sum of the squares ofthe sales market shares of all firms in a CRSP three-digit SIC industry that have sales data on Compustat. Foreach year t, HI-Compustat is averaged over a three-year period from year t 2 to year t. FFR-Census in thistable represents the four-firm ratio for four-digit SIC industries calculated using data collected from Census ofManufactures publications. Over the 19952001 period, FFR-Census values for six-digit NAICS industries areused to approximate FFR-Census for broader four-digit SIC industries by first determining which componentsix-digit NAICS industry of a broader four-digit SIC industry has the largest sales as measured by the sales ofits top four firms. Next, we divide the sales of the top four firms of this six-digit NAICS industry by the totalsales of the firms in all the component six-digit NAICS industries within the broader four-digit SIC industry.FFR-Compustat is the sum of the market shares of the four largest firms in terms of market share in a CRSPfour-digit SIC industry. A firms market share is measured as sales divided by total sales of all CRSP firms inthat industry that have sales data on Compustat. For each year t, FFR-Compustat is averaged over a three-yearperiod from year t 2 to year t. All independent variables are industry averages of firm-level characteristics.Ln(Market Capitalization) is defined as the natural logarithm of the market value of equity in millions at the endof year t. Ln(B/M) is defined as the natural logarithm of book value of equity divided by market value of equityat the end of year t. Beta is market model beta estimated using the prior thirty-six monthly equally weightedCRSP index returns. Leverage is defined as total debt divided by market value of total assets (i.e., market valueof equity plus debt) at the end of year t. R&D expense/book assets, Earnings/book assets, and Dividends/bookequity are as of year t and are trimmed at the 1% level. Leverage is also trimmed at the 1% level. Time-seriesaverage values of the monthly regression coefficients are reported with time-series t-statistics in parentheses. ,

, and represent significance at the 1%, 5%, and 10% levels, respectively, for a two-tailed test.

The first column of Table 7 shows the replication of the Hou and Robinson(2006) finding that HI-Compustat is significantly negatively associated withresearch and development expenses. The second column presents estimates ofthe model that replaces the dependent variable HI-Compustat with HI-Census.We find a significant positive association between HI-Census and research and

3856



Dow

nloaded from


development expenses. If U.S. Census-based industry concentration measuresare more appropriate measures of actual industry concentration, then this resultindicates that innovation risk is actually higher in more-concentrated industries.This finding is consistent with the claim made in Schumpeter (1942) that there ismore innovation in less-competitive industries because firms in such industriescan enjoy economic profits resulting from their innovation, instead of havingthese profits competed away. However, this finding is not supportive of the Houand Robinson (2006) claim that higher innovation risk in less-concentratedindustries raises the overall cost of capital in these industries.

There is another notable difference in the results in columns 1 and 2 in Table 7.Similar to Hou and Robinson (2006), we find a significant negative associationbetween HI-Compustat and firm size. However, HI-Census is significantlypositively associated with firm size. Given that more-concentrated industriesare expected to have on average larger firms, these results provide furthersupport to the arguments made in section 2 of the article that U.S. Census-basedmeasures of industry concentration are more meaningful than are Compustat-based measures.

Columns 3 and 4 of Table 7 present results of the models that use FFR-Compustat and FFR-Census, respectively, as the dependent variables. Also,these results are for the longer sample period from 1963 to 2001, the sampleperiod used in Hou and Robinson (2006). The relations of FFR-Compustatand FFR-Census with research and development expenses and firm size arequalitatively the same as those reported in columns 1 and 2 of Table 7.

3.2 Lang and Stulz (1992)Lang and Stulz (1992) examine the intra-industry effects of bankruptcy an-nouncements. They show that in industries with high leverage that are lessconcentrated, there is a significant and important negative price reaction to abankruptcy announcement in the industry. They argue that this price reactionreflects the loss experienced by other firms in the industry, because the an-nouncement conveys information about lower future cash flows for these firms.They refer to this effect as a contagion effect. They also find that industrieswith low leverage that are more concentrated exhibit significantly positive pricereactions to a bankruptcy announcement in the industry. They contend that thisprice reaction reflects the benefits to competitors that result from the difficultiesfaced by the bankrupt firm and refer to it as a competitive effect.

We investigate whether the Lang and Stulz (1992) results, reported in theirTables 3 and 4, are robust to the use of U.S. Census-based industry concen-tration measures. They examine fifty-nine bankruptcies from 1970 to 1989.Our sample of bankrupt firms comes from the current Altman-NYU SalomonCenter Bankruptcy list, which is an updated version of the data used by Langand Stulz (1992). We examine eighty-six bankruptcies that took place in themanufacturing sector from 1980 to 2004. As in Lang and Stulz (1992), we

3857



Dow

nloaded from


Table 8Abnormal returns for subsamples of industry portfolios for the eleven days around a bankruptcyannouncement

Industry portfoliocharacteristics

No. of industry portfolioswith industrycharacteristics below/abovethe sample median

Average abnormal returns for the subsampleof industry portfolios with the value of theindustry portfolio characteristics below/abovethe sample median

Below Above

Panel A: Compustat-based industry concentration measures

Leverage 43/43 0.140 2.319(0.101) (2.587)

[2.278]HI-Compustat 43/43 2.891 0.432

(1.920) (0.646)[1.869]

HI-Compustat (subsample ofindustry portfolios withbelow-median leverage)

19/24 2.700 1.887(0.931) (2.038)

[2.736]HI-Compustat (subsample ofindustry portfolios withabove-median leverage)

24/19 3.042 1.406(2.149) (1.466)

[1.457]Panel B: U.S. Census-based industry concentration measures

HI-Census 43/43 1.328 1.131(1.385) (0.844)

[0.154]HI-Census (subsample of

industry portfolios withbelow-median leverage)

18/25 0.670 0.242(0.428) (0.116)

[0.513]HI-Census (subsample of

industry portfolios withabove-median leverage)

25/18 1.802 3.038(1.498) (2.267)

[0.403]

Cumulative abnormal returns (market model errors) are calculated from day 5 to +5 relative to bankruptcyannouncements. Bankruptcy announcements are from the Altman-NYU Salomon Center Bankruptcy list andinclude bankruptcies between January 1980 and December 2004 with liabilities greater than $100 million forfirms in the manufacturing sector (SIC codes between 2000 and 3999) with a primary SIC code that is availablefrom Compustat (eighty-six bankruptcies). An industry portfolio is a value-weighted portfolio of firms withthe same primary four-digit SIC code as the bankrupt firm for which announcement returns are available fromthe CRSP files. Except for HI-Census, industry characteristics are obtained from Compustat for the fiscal yearpreceding the announcement. HI-Census is the Herfindahl index for four-digit SIC industries as reported by theCensus of Manufactures. The 1997 and 2002 U.S. Censuses define industry at the six-digit NAICS level. Overthe 19952004 period, we use HI-Census values for six-digit NAICS industries to calculate HI-Census valuesfor four-digit SIC industries by weighting the HI-Census values of component six-digit NAICS industries bythe square of their share of the broader four-digit SIC industry. To determine what are the component six-digitNAICS industries of a broader four-digit SIC industry, we use NAICS correspondence tables provided by theU.S. Census. HI-Compustat is defined as the sum of the squares of the sales market shares of all firms in aCompustat four-digit SIC industry. Leverage is the debt-to-total assets ratio. The numbers in parentheses arez-statistics and the numbers in square brackets are z-statistics for differences between subsamples. , , and represent significance at the 1%, 5%, and 10% levels, respectively, for a two-tailed test.

define industry using four-digit SIC codes and calculate announcement returnsfrom days 5 to +5 relative to bankruptcy announcements.

Panel A of Table 8 reports our replication of the Lang and Stulz (1992)univariate results. First, the announcement returns for portfolios of industrypeers are significantly more negative in industries with leverage above the sam-ple median. Second, the announcement returns are significantly more positive

3858



Dow

nloaded from


for industries with HI-Compustat values above the sample median. Third, forindustries with high HI-Compustat and low leverage, the announcement returnsare significantly positive (1.887%). Finally, for industries with high leverageand low HI-Compustat values, the announcement returns are significantly neg-ative (3.042%). These results are similar to those reported in Lang and Stulz(1992). Panel B presents the results with HI-Compustat replaced with HI-Census. None of the significant relationships involving industry concentrationthat are observed in panel A are found to be significant in panel B. Thus, resultsbased on the U.S. Census-based industry concentration measures are not con-sistent with those reported in Lang and Stulz (1992) and do not support theirconclusions.

Lang and Stulz (1992) use multivariate regression models to control for otherfactors that might be related to the announcement returns. We reexamine theirregression results and report our findings in Table 9. The first four columns inthis table present the results from estimating the four multivariate models ofTable 4 of Lang and Stulz (1992). In the first three models, the explanatoryvariables of interest are the three dummy variables representing high debt/highHI-Compustat, low debt/high HI-Compustat, and low debt/low HI-Compustat.Consistent with the Lang and Stulz (1992) results, in all three models the coef-ficients on the dummy variable representing low debt/high HI-Compustat arepositive and significant. They conclude that this result reflects the competitiveeffect. The coefficients on the other variables in these models are also similar tothose in Lang and Stulz (1992). In column 4, the dummy variables are replacedby HI-Compustat and leverage. Once again, consistent with the Lang and Stulz(1992) results, the coefficient on HI-Compustat is positive and significant, andthe coefficient on leverage is insignificant. The coefficients on the remainingvariables are also consistent with Lang and Stulz (1992).

Columns 58 of Table 9 present results from estimating the announcementreturn models after replacing HI-Compustat with HI-Census. In columns 57,the coefficient on the dummy variable representing low debt/high HI-Censusis not significant. Also in the last column, the coefficient on HI-Census is notsignificant. In sum, the Lang and Stulz (1992) results and conclusions are notrobust to using U.S. Census-based industry concentration measures instead ofCompustat-based measures.

3.3 Harris (1998)Harris (1998) shows that firms are less likely to disclose separate segmentinformation for operations in more-concentrated industries. She argues thatfirms behave in this manner to protect the abnormal profits and market sharesrelated to their operations. She uses Compustat-based industry concentrationmeasures in her empirical analyses. We reexamine the sensitivity of her resultsto using U.S. Census-based industry concentration measures. Specifically, wereestimate the multivariate logit model in her Table 3 on a sample of manu-facturing firms. To create the dependent variable in these models, we follow

3859



Dow

nloaded from

TheR

eviewofFinancialStudies/

v22

n10

2009

Table 9Weighted least squares regressions of industry portfolio market model cumulative residuals at bankruptcy announcements on industry characteristics

1 2 3 4 5 6 7 8

Intercept 0.034 (2.19) 0.157 (4.79) 0.156 (4.70) 0.100 (2.68) 0.023 (1.43) 0.137 (4.29) 0.139 (4.30) 0.120 (3.14)1 if high debt/high

HI-Compustat; 0 otherwise0.016 (0.68) 0.028 (1.34) 0.029 (1.37)

1 if low debt/highHI-Compustat; 0 otherwise

0.053 (2.41) 0.061 (3.01) 0.060 (2.95)

1 if low debt/lowHI-Compustat; 0 otherwise

0.006 (0.24) 0.017 (0.79) 0.016 (0.77)

1 if high debt/high HI-Census; 0otherwise

0.009 (0.37) 0.019 (0.85) 0.020 (0.89)

1 if low debt/high HI-Census; 0otherwise

0.024 (1.08) 0.025 (1.26) 0.024 (1.18)

1 if low debt/low HI-Census; 0otherwise

0.016 (0.66) 0.013 (0.58) 0.012 (0.55)

HI-Compustat 0.053 (2.02)HI-Census 0.058 (0.36)Returns correlation 0.016 (0.27) 0.023 (0.40)Leverage 0.004 (0.05) 0.022 (0.27)Log of average price 0.043 (4.25) 0.044 (4.27) 0.042 (3.92) 0.043 (4.14) 0.043 (4.16) 0.040 (3.65)Distress cumulative return 0.020 (0.62) 0.016 (0.53) 0.007 (0.21) 0.009 (0.28) 0.005 (0.15) 0.002 (0.07)Predistress cumulated return 0.024 (1.03) 0.024 (0.97)Adjusted R-square 0.031 0.199 0.197 0.157 0.022 0.149 0.150 0.116Number of observations 86 86 86 86 86 86 86 86

Cumulative abnormal returns (market model errors) are calculated from day 5 to +5 relative to bankruptcy announcements. Bankruptcy announcements are from the Altman-NYUSalomon Center Bankruptcy list and include bankruptcies between January 1980 and December 2004 with liabilities greater than $100 million for firms in the manufacturing sector (SICcodes between 2000 and 3999) with a primary SIC code that is available from Compustat (86 bankruptcies). An industry portfolio is a value-weighted portfolio of firms with the sameprimary four-digit SIC code as the bankrupt firm for which announcement returns are available from the CRSP files. The industry characteristics are obtained from Compustat for thefiscal year preceding the announcement except for HI-Census and the returns correlation variable. HI-Census is the Herfindahl index for four-digit SIC industries as reported by the Censusof Manufactures. The 1997 and 2002 U.S. Censuses define industry at the six-digit NAICS level. Over the 19952004 period we use HI-Census values for six-digit NAICS industries tocalculate HI-Census values for four-digit SIC industries by weighting the HI-Census values of component six-digit NAICS industries by the square of their share of the broader four-digitSIC industry. To determine what are the component six-digit NAICS industries of a broader four-digit SIC industry, we use NAICS correspondence tables provided by the U.S. Census.HI-Compustat is defined as the sum of the squares of the sales market shares of all firms in a Compustat four-digit SIC industry. High/low debt, HI-Compustat, and HI-Census are as definedin Table 8. Returns correlation is the correlation between the industry portfolio and the bankrupt firm returns for the year preceding the announcement. Leverage is the average debt-to-totalassets ratio in a firms industry. Log of average price is the natural logarithm of average stock price in a firms industry. Distress cumulative return is the industry portfolio cumulative returnin excess of the market return from five days before the first distress announcement to five days before the bankruptcy announcement. Predistress cumulated return is the industry portfoliocumulative return in excess of the market return from 800 to 50 days before the first distress announcement. As in Lang and Stulz (1992), we identify first distress announcements using themethodology employed by Gilson, John, and Lang (1990). t-statistics are in parentheses. , , and represent significance at the 1%, 5%, and 10% levels, respectively, for a two-tailedtest.

3860



her approach and use the Compustat Multiple SICs Tape, which reports all theSIC codes for a firm in a given year. As in Harris (1998), we compare theseSICs to those appearing in the Compustat Industry Segment File, which reportsthe segments actually disclosed in a firms annual report. If a three-digit SICin which a firm has operations is reported as a primary or secondary SIC forone of the firms business segments, the dependent variable takes a value of 1;otherwise, the dependent variable equals 0. This definition allows for multiplefirm-industry observations in a given year. The Compustat Multiple SICs Tapewas discontinued in 1998. However, we were able to locate the 1997 CompustatMultiple SICs Tape and we used it in our analysis. This tape allows us to studythe period from 1995 to 1997 rather than the period from 1987 to 1991 studiedby Harris (1998). We do not study the years examined in Harris (1998) becausewe have data for the SIC codes in which a firm has operations only for 1997.

Table 10 reports the results of our logit regressions. The model in the firstthree columns of the table uses three-digit SIC level HI-Compustat as themeasure of industry concentration, although Harris (1998) measures industryconcentration as FFR-Compustat at the three-digit SIC level. Our reason forusing HI-Compustat as an explanatory variable instead of FFR-Compustatin the regression models is that the 1997 Census of Manufactures reportsconcentration measures at the six-digit NAICS level, and it is possible touse these data to calculate HI-Census precisely at the three-digit SIC level,whereas we can compute only approximate FFR-Census values for three-digitSIC industries.14 The other explanatory variables in the model are the same asin Harris (1998).

We follow Harris (1998) and report results for each of the sample yearsseparately. The coefficients on HI-Compustat are negative and significant foreach of the individual sample period years examined, consistent with the Harris(1998) results. However, in models 46 in Table 10, in which we replace HI-Compustat with HI-Census, the coefficients on HI-Census are insignificant foreach of the sample years. Thus, the Harris (1998) findings are sensitive to theuse of the U.S. Census-based industry concentration measures in place of theCompustat-based measures.

3.4 Defond and Park (1999)Finally, we reexamine the Defond and Park (1999) result that CEO turnoveris negatively associated with industry concentration. They argue that inmore competitive industries there is greater homogeneity across firms andCEOs are likely to have more peers, making it easier to identify and replacepoorly performing CEOs.

Defond and Park (1999) use the Lexis/Nexis news database for the periodfrom 1988 to 1992 to construct their CEO turnover sample. Their control sample

14 We reestimate all the models in Table 10 using FFR-Compustat and FFR-Census calculated for three-digit SICindustries and obtain results similar to those reported in this table.

3861



Dow

nloaded from

TheR


v22

n10

2009

Table 10Logit analysis of managers business segment reporting decisions

1 2 3 4 5 6Sample period 1995 1996 1997 1995 1996 1997

Intercept 1.778 (0.000) 1.900 (0.000) 1.959 (0.000) 1.678 (0.000) 1.827 (0.000) 1.917 (0.000)HI-Compustat 0.536 (0.043) 0.494 (0.056) 0.486 (0.064)HI-Census 1.215 (0.458) 0.074 (0.964) 1.551 (0.338)ROA persistence 0.152 (0.015) 0.106 (0.087) 0.143 (0.021) 0.164 (0.009) 0.115 (0.064) 0.144 (0.020)Earnings persistence across the SICs in which a firm operates 1.112 (0.000) 1.115 (0.000) 1.067 (0.000) 1.194 (0.000) 1.182 (0.000) 1.135 (0.000)Median industry sales scaled by firm sales 0.841 (0.000) 0.839 (0.000) 0.838 (0.000) 0.836 (0.000) 0.841 (0.000) 0.841 (0.000)Number of SICs in which the firm operates 0.718 (0.000) 0.735 (0.000) 0.769 (0.000) 0.720 (0.000) 0.736 (0.000) 0.770 (0.000)Likelihood ratio 1.896 (0.000) 2.034 (0.000) 2.121 (0.000) 1.893 (0.000) 2.030 (0.000) 2.119 (0.000)Number of observations 5109 5384 5468 5109 5384 5468

The sample period is 19951997 and includes firms in the manufacturing sector (SIC codes between 2000 and 3999) covered by Compustat. The dependent variable takes a value of 1 ifduring the current year firm i decides to provide a segmental disclosure of its operations for a three-digit SIC industry j in which it operates, and equals 0 otherwise. This allows for multiplefirm-industry observations in a given year. HI-Compustat is defined as the sum of the squares of the sales market shares of all firms in a Compustat three-digit SIC industry. HI-Census isthe Herfindahl index for three-digit SIC industries as reported by the Census of Manufactures. The 1997 U.S. Census defines industry at the six-digit NAICS level. Over the 19951997period we use HI-Census values for six-digit NAICS industries to calculate HI-Census values for three-digit SIC industries by weighting the HI-Census values of component six-digitNAICS industries by the square of their share of the broader three-digit SIC industry. To determine what are the component six-digit NAICS industries of a broader three-digit SIC industrywe use NAICS correspondence tables provided by the U.S. Census. ROA persistence represents the speed of adjustment for firm-level positive abnormal return on assets in industry j andis measured as in Harris (1998) as the slope coefficient B2j from the following regression model, Xijt = B0j + B1j (DnXijt1) + B2j(Dp Xijt1) + eijt, where Xijt = the difference betweenfirm is ROA and mean ROA for its industry j, in year t, Dn = 1 if Xijt1 is less than or equal to zero, zero otherwise, and Dp = 1 if Xijt1 is greater than zero, zero otherwise. Earningspersistence across the SICs in which a firm operates is measured as the maximum value minus the minimum value of ROA persistence for a firm in the three-digit SICs in which the firmhas operations during the current year. Median industry sales scaled by firm sales is measured as median sales for single-segment firms in a three-digit SIC industry j divided by firm issales. Number of SICs in which the firm operates is measured as the number of three-digit SIC industries in which the firm operates during the current year. Significance levels for Waldchi-square test statistics are in parentheses.

3862



consists of firms in the Compact Disclosure database without any CEO turnoverduring their sample period. Their final sample has 301 firm-year observationswith CEO turnovers and a control sample of 2429 firm-year observations withno CEO turnovers. For our sample, we consider manufacturing firms in theExecuComp database over the period from 1994 to 2000 and treat firm-yearsfor which the CEO for year t differs from the CEO for year t 1 as a CEOturnover observation. Like Defond and Park (1999), we consider all instancesof CEO turnover regardless of the reason for the turnover. Our control sampleconsists of manufacturing firms included on the ExecuComp database that donot experience a CEO turnover during the period from 1994 to 2000. Ourfinal sample consists of 203 firm-year observations with CEO turnovers anda control sample of 2267 firm-year observations with no CEO turnovers. Wepoint out that we do not study a longer sample period so as to not imposethe condition that control firms have no CEO turnover over this longer period.Also, we study the period from 1994 to 2000 rather than a more recent periodso that our sample period can be closer in time to the Defond and Park (1999)sample period.15

The first three models in Table 11 report the results of replications of the threelogit models of panel A of Table 4 of Defond and Park (1999). The dependentvariable takes a value of 1 for firm-year observations with a CEO turnover,and a value of 0 for firm-years belonging to the control sample. The mainindependent variable of interest is the square root of HI-Compustat calculatedat the two-digit SIC level. All the models have the same set of control variables,except that the first model does not include analysts earnings forecast errors,the second model does not include industry-relative earnings, and the thirdmodel includes both of these variables. The results in the first three columns ofTable 11 show that there is a significantly negative association between CEOturnover and the square root of HI-Compustat, consistent with the Defond andPark (1999) results. The results for the control variables are also consistentwith those in Defond and Park (1999), except that our coefficients on analystsearnings forecast errors are not statistically significant.

The models in the fourth to the sixth columns of Table 11 replace the squareroot of HI-Compustat, as an explanatory variable, with the square root ofHI-Census calculated at the two-digit SIC level. We do not find significantassociations between CEO turnover and the square root of HI-Census in any ofthe three models. Consequently, the Defond and Park (1999) findings and theirconclusions are not robust to using U.S. Census-based industry concentrationmeasures instead of Compustat-based measures.

15 Given that comprehensive data in the ExecuComp database is available from 1993 onward and the requirementthat we have information on the identity of the CEO in year t 1, 1994 is the earliest year in which our sampleperiod can begin.

3863



Dow

nloaded from

TheR


v22

n10

2009

Table 11Logit analysis predicting CEO turnover

1 2 3 4 5 6

Intercept 6.695 (7.08) 6.758 (7.12) 6.703 (7.09) 8.155 (10.56) 8.010 (10.29) 8.174 (10.60)Square root of HI-Compustat 6.013 (2.82) 5.023 (2.43) 6.033 (2.82)Square root of HI-Census 2.490 (0.94) 2.168 (0.81) 2.432 (0.92)Industry-relative earnings 2.930 (3.77) 2.883 (3.73) 2.696 (3.43) 2.653 (3.39)Analysts earnings forecast errors 1.835 (1.00) 1.654 (0.90) 1.754 (0.96) 1.571 (0.86)Market-adjusted stock returns 0.403 (1.84) 0.579 (2.63) 0.415 (1.88) 0.436 (1.99) 0.593 (2.70) 0.445 (2.03)Age of CEO 0.097 (8.47) 0.093 (8.24) 0.097 (8.47) 0.097 (8.51) 0.094 (8.32) 0.097 (8.51)Dummy variable for if CEO age

= 63, 64, or 650.597 (2.93) 0.573 (2.81)

Ali Klasa Yueng RFS 2009

Documents

compustat data

industry decline

actual industry concentration

compustatbased measures

census measures

university of texas

number of studies

importantprior studies