Development, crime and punishment: accounting for the international differences in crime rates Rodrigo R. Soares * Department of Economics, University of Maryland, 3105 Tydings Hall, College Park, MD 20742, USA Graduate School of Economics, Getu ´lio Vargas Foundation, Rio de Janeiro, Brazil Received 1 January 2000; accepted 1 December 2002 Abstract This paper analyzes the determinants of the heterogeneity in crime rates across countries, focusing on reporting rates and development. The behavior of the reporting rate is studied by comparing data from victimization surveys to official records. Reporting rates are strongly correlated with development: richer countries report a higher fraction of crimes. The positive relation between development and crime found in previous studies is shown to result from this correlation. Once the presence of the reporting error is accounted for, development does not affect crime. Reductions in inequality and increases in growth and education are associated with reductions in crime rates. D 2003 Elsevier B.V. All rights reserved. JEL classification: K42; O10; O17; O57; Z13 Keywords: Crime; Development; Reporting rate; Inequality; Victimization 1. Introduction Crime rates vary enormously across countries, and their variation in this dimension is orders of magnitude larger than their variation through time in any given country. For example, the number of homicides per 100,000 inhabitants, probably the most popular crime statistic, ranges from 17 to 0.6 for countries like, respectively, Mexico and Japan. At 0304-3878/$ - see front matter D 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.jdeveco.2002.12.001 * Department of Economics, University of Maryland, 3105 Tydings Hall, College Park, MD 20742, USA. E-mail address: [email protected] (R.R. Soares). www.elsevier.com/locate/econbase Journal of Development Economics 73 (2004) 155 – 184
30
Embed
Development, crime and punishment: accounting for the ... · Development, crime and punishment: accounting for ... and Fowles and Merva ... significant from crime to crime and from
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
www.elsevier.com/locate/econbase
Journal of Development Economics 73 (2004) 155–184
Development, crime and punishment: accounting for
the international differences in crime rates
Rodrigo R. Soares*
Department of Economics, University of Maryland, 3105 Tydings Hall, College Park, MD 20742, USA
Graduate School of Economics, Getulio Vargas Foundation, Rio de Janeiro, Brazil
Received 1 January 2000; accepted 1 December 2002
Abstract
This paper analyzes the determinants of the heterogeneity in crime rates across countries,
focusing on reporting rates and development. The behavior of the reporting rate is studied by
comparing data from victimization surveys to official records. Reporting rates are strongly
correlated with development: richer countries report a higher fraction of crimes. The positive
relation between development and crime found in previous studies is shown to result from this
correlation. Once the presence of the reporting error is accounted for, development does not affect
crime. Reductions in inequality and increases in growth and education are associated with
R.R. Soares / Journal of Development Economics 73 (2004) 155–184166
where ln(crime) is the natural logarithm of the different measures of crime (as percentages
of the population); ln(gnp) is the natural logarithm of the GNP per capita at constant 1995
US$; ratio is a measure of inequality, based on the ratio between the share of income or
consumption of the 20% richest of the population to the share of income or consumption
of the 20% poorest;4 urb is the percentage of the population living in urban areas; educ is
the gross primary enrollment rate; growth is the average GNP per capita growth in the
period; ln(pol) is the natural logarithm of the percentage of policemen in the population;
and chr is a dummy indicating whether the religious majority in the population is
Christian.5 Appendix B presents the description and sources of these variables.
The first three right hand side variables together with economic growth are the
development related variables that constitute our main interest. The education and religion
variables are introduced as controls for possible taste shifts that may be correlated with
economic development itself, and may affect both crime and reporting behavior. The
police variable is a natural control for the crime prevention measures taken by the different
countries.
Three specifications of this equation are run for each type of crime. We begin by
including all six variables, and then consecutively exclude the police and religion
indicators, and the growth and education variables. Table 4 presents the results. The first
three columns are related to the official data and the last three to the victim survey data.
Robust standard errors are used in all cases.
Some clearly identifiable differences arise when we compare the results from the two
different data sets.6 In the official data regressions, the effect of income (ln(gnp)) is always
positive and statistically significant. For eight out of nine cases, the effect of inequality
(ratio) is positive, but only the results for contact crimes are statistically significant. The
4 Atkinson and Brandolini (2001) raise doubts regarding the international comparability of the income
distribution data collected by Deininger and Squire (1996). It is possible that our inequality variable—based on
the share of income or consumption of different groups of the population—suffers from the same kind of
problems they discuss. This is an issue that cannot be dealt with in this paper, but should be kept in mind when
analyzing the results. Measurement error on the inequality variable alone would bias its coefficient towards zero
and the coefficients on the other independent variables in unpredictable ways.5 The coefficients have the following interpretation: for ln(gnp) and ln(pol), they are simply elasticities; for
ratio, urb, educ and growth, the relative change on the dependent variable given a one unit change in the
independent variable (respectively, a one time increase in the gnp of the 20% richest of the population in relation
to the gnp of the 20% poorest, 1% more of the population living in an urban area, 1% more primary enrollment or
1% more economic growth); for the religion dummy, the relative increase in crime if the country has a majority of
that religion. It is important to keep in mind that these are percentage and relative changes on the rates of crimes,
not absolute changes in its level.6 There is a well-known problem of endogeneity of the police variable here (see, for example, Levitt, 1997).
As we did not find a good instrument for it, we chose to present the equations with and without the ln(pol)
variable included on the right-hand side. As can be seen from the tables, there is almost no change on the
qualitative results (in terms of significance and sign of the coefficients) as police is excluded from the regressions.
Besides, the presence of the police variable in the UNCS data set is very irregular, so that its inclusion in the
regression hugely reduces the number of observations. For these reasons, we ignore the coefficients on the police
variable in the following discussion, since we do not have any particular interest on them. Just for the record, it
shows up as positive and borderline significant in the regression for official data on thefts, and in both regressions
for burglaries.
R.R. Soares / Journal of Development Economics 73 (2004) 155–184 167
effect of urbanization (urb) has different signs in the different specifications, never being
significant, and the same happens for education (educ) and growth (growth).
For the victim survey data, in six out of nine cases the effect of income is negative (in
three cases it is borderline significant, although only one case is significant at the 5%
level). The effect of inequality is positive, again, in eight out of nine cases, and it is
significant or borderline significant in all the cases. Urbanization has a positive effect in
seven cases, being close to significant for all the burglaries regressions and for the shortest
specification for both thefts and contact crimes. All the other variables are generally not
significant and change signs in the different specifications.
Given this general description, it seems fair to say that the two data sets describe very
different pictures regarding the relationship between crime and development. The official
records suggest that crime rates increase with income per capita, and seem to be positively
affected by inequality, although the evidence regarding the latter is not very strong. On the
other hand, the victim survey indicates that, if anything, crime rates seem to decrease with
income per capita, although a more precise statement would be that these two variables are
not very strongly related. Moreover, inequality has a strong correlation with crime rates in
this case and there is some weak evidence that urbanization may also have a positive
effect.
If one believes that there is a reporting problem in the official data, and that this
problem is less severe in the victim surveys, these different results are actually telling
something about the nature of the reporting error, and the way in which it correlates with
the independent variables. The cross-section regressions indicate that the reporting error is
not random, and introduces systematic biases on the estimates obtained from official data.
The extremely different conclusions obtained from the two data sets, particularly in respect
to income, support the hypothetical relation between underreporting and economic
development, and indicate that this relation is serious enough to call into question the
results of the studies discussed in Section 3. In Section 4.3, we analyze explicitly the
determinants and the characteristics of the reporting rate.
4.3. The determinants of the reporting rate
Evidence from Section 4.3 stresses the importance of the underreporting of crimes in
official data, and strongly suggests that it may be affected by variables related to
development.7 If we assume that the victim survey data represent the ‘real’ crime rate
or, at least, that their deviations from the ‘real’ rate are not correlated with the exogenous
variables, we can use the two different data sets to recover a cross-section of the reporting
error. This cross-section can then be used to analyze the relation between the reporting
error and the development-related variables.
7 The analysis of the differences between data from official records and from surveys is a recurrent subject in
applied criminology research. References in this area include Kitsuse and Cicourel (1963), Skogan (1976), Cohen
and Land (1984), Biderman and Lynch (1991), Figlio (1994), O’Brien (1996), Levitt (1998) and many others.
Although the topics covered in this literature are very diverse, the discussion is almost always centered on
national data (where the problem is most likely less serious), and nobody addresses the same problem that we are
trying to address here.
R.R. Soares / Journal of Development Economics 73 (2004) 155–184168
We do this by constructing a measure of the reporting rate (fraction of crimes actually
registered by the official records) and running the following regression:
R.R. Soares / Journal of Development Economics 73 (2004) 155–184172
R.R. Soares / Journal of Development Economics 73 (2004) 155–184 173
measured, such as institutional development and degree of law enforcement. To uncover
the exact mechanism behind this relation is an important topic for future research, but the
simple fact that this correlation exists is enough to bias the coefficients on development
related variables estimated from traditional crime regressions.
Despite this problem, if one wants to explore the time dimension of the behavior of
crime rates, hope will still rest on official data sets, like the UNCS, since the International
Crime Victim Survey is very recent. For this reason, in Section 5, we suggest a way of
correcting the official records with information obtained from the victim survey cross-
section, such that the reporting error is taken into account. The idea is to understand under
what conditions, observing only one cross-section of reporting rates, we would be able to
estimate the ‘underreporting structure’ and, with that in hand, eliminate the bad variation
from the official data. If these conditions are not very strict, they will allow the use of the
panel data set from the UNCS, controlling for contamination from the reporting error. In
the last two parts of the section, we apply the strategy to the UNCS data and discuss the
results.
5. An alternative use of the official data
5.1. Econometric approach
Suppose that crime rates are determined according to the following equation:
Y* ¼ Xh þ e; ð3Þwhere Y* stands for the logarithm of the crime rate of a specific type of crime, X is a vector
of country’s characteristics and e is an error term for which Cov(e,X) = 0.The only data observed on a panel basis is the official data, which is the ‘true’ data plus
a ‘reporting error’:
Y ¼ Y*þ m: ð4ÞThe reporting error, as the evidence presented on the preceding sections suggests, is
assumed to be correlated with the country’s characteristics, such that Cov(m,X ) p 0. For the
usual reasons, if we regress Y on X, we get a biased estimator of h, for which E(hAX ) =h+(XVX )� 1XVE(mAX ) p h.
If we could obtain an estimator of m such that E(mAX ) =E(mAX), we could build the
series (Y� m), and regress it on X to obtain an unbiased estimate of h. The only hope in
this direction lies with the cross-section observations available from the victim survey data
(from the ICVS data set). The comparison of this data with the UNCS data (based on
official records) allows us to build a vector mt of cross-section observations of the reporting
error at a given point in time. If, additionally, the joint distribution of r and X is invariant
across countries and time, this single cross-section will allow us to obtain all the relevant
information regarding the correlation between m and X. Maintaining this assumption, and
supposing that m and X are jointly normally distributed, we have that
EðmAX Þ ¼ X c ð5Þ
R.R. Soares / Journal of Development Economics 73 (2004) 155–184174
where c is the vector of coefficients of the linear regression of m on X (where X includes the
unit vector). In this case, the projection of the cross-section vector rt on the corresponding
matrix Xt will, given our invariance assumption, give an unbiased estimate of c. We can
then go on to construct m=Xc for all the periods and countries covered by the official data,
with E(mAX) =XE(cAX) =XE(cAXt) =Xc=E(mAX).With this estimate of m in hand, the official data Y can be corrected and an unbiased
estimate of h can be obtained from the regression of (Y� m) on X. The Appendix
derivation proves that this procedure produces an unbiased estimator of h and, addition-
ally, derives an unbiased estimator of its covariance matrix. In Section 5.2, we apply this
strategy to our data set.
5.2. Estimation
We apply the approach described in Section 5.1 to the UNCS data set, using the cross-
section from the ICVS to construct the vector mt. The matrix X here has the same variables
used in the right hand side of the cross-section equations (see Eq. (1)).
Due to data limitations, the income inequality variable is country specific, in the sense
that it changes from country to country, but remains constant for a given country through
time. Deininger and Squire (1996) have noticed, after extensively documenting the
methodology and availability of international data on inequality, that ‘‘changes in the
Gini coefficient of inequality tend to be small’’ compared to changes in other economic
variables (Deininger and Squire, 1996, p. 587). This reduces the concern in relation to this
limitation of the data. The religion dummy, for obvious reasons, is also constant through
time, while all other variables change with time and country.
The characteristics matrix X can, thus, be divided into two subsets, V and F, where the
typical vector of V is the time and country variant mit, and the typical vector of F is the
country variant and time fixed fi. Estimates of pooled regressions and within and between
decompositions are presented.
In relation to c (the estimated c) and the correction of Y discussed in Section 5.1, one
main concern guides our approach. We want to eliminate the correlation between X and mfrom the official data, but we want c to be estimated with some precision, to avoid actual
differences in crime rates to be also eliminated from the data. This constitutes a problem
since the cross-section that we have available for m is a small sample, and some of the
variables included in X are highly correlated with each other. For this reason, we decide to
restrict the X ’s included in Eq. (5) only to those that show up significantly in the reporting
rate regressions, and so, based on the evidence from Table 5, we end up using only the
ln(gnp) to correct the official data.9 The estimated c is then used to correct the observed Y
in the way described in Section 5.1.10
9 Inclusion of other variables, such as urb and educ, do not change the main conclusions. These results are
available from the author upon request.10 This approach limits the use of the correction procedure proposed as a tool for prediction. As income level
is probably capturing other variables correlated with development, and the relation between income and these
variables may not be stable through time, to use the estimated relation to forecast reporting rates in the long run
may be misleading. This would correspond, in the econometric model discussed, to the relation between m and theobserved X not being invariant through time.
R.R. Soares / Journal of Development Economics 73 (2004) 155–184 175
Because of the correction applied to the data, the standard error estimates usually
obtained from OLS procedures will generally be biased. We calculate standard errors of
the estimated coefficients according to the procedure outlined in Appendix. Also, since the
UNCS data is irregularly distributed across countries and time, it is difficult to treat each
year as one observation. To implement the panel estimation and to increase the cross-
country comparability of the data, we form three periods with the UNCS data: 1975–1983,
1984–1988 and 1989–1994. Averages for the sub-periods are calculated for each country
and this data set with three points in time is used in the estimation. In the case of
burglaries, the panel data analysis is further limited by the shorter availability of this
variable in the UNCS data set (from 1986 on, so that only the last two periods are
available).
Table 6 presents the results of the estimation for, respectively, thefts, burglaries and
contact crimes for the pooled data. For the purpose of comparison, we also present the
coefficients for ln(gnp) and ratio estimated before, from the official data cross-sections.
Table 7 presents the within and between effects of the independent variables.
5.3. Analysis of the results
Table 6 shows that inequality is the variable most consistently related to crime rates
in the pooled regressions. The positive effect of inequality is significant for the theft
and contact crime regressions, and borderline significant for one of the burglary
regressions. Education and growth, on the other hand, seem to have a negative effect
on crime. Education is negatively related to both thefts and contact crimes, while
economic growth is negatively related to thefts only. Finally, per capita income,
urbanization, police presence and religion do not show up as being significantly related
to crime.
Again, inequality is the variable most closely linked to crime rates. And if we compare
the effects of income and inequality in Table 6 with the ones obtained from the official
data cross-sections, we see that the correction procedure and the use of the panel reduce
the estimated effect of income and increase the one of inequality by considerable
magnitudes. The evidence presented in Section 4.3 indicates that this is precisely the
kind of adjustment that we should expect.
It is worth mentioning that, since our education variable is enrollment rate, it is possible
that the results related to education actually reflect the effect of taking children and
teenagers out of the streets on smaller types of thefts (pick-pocketing, for example), street
fights and so on, instead of being a direct effect of education by itself.
The results on growth in terms of the two different types of crimes are also interesting,
for they go in the direction that should be expected. Since theft is a more ‘economic’
Table 6
Panel regressions for corrected data
Thefts Burglaries Contact crimes
1 2 1 2 1 2
ln(gnp) 0.2422
(0.1605)
0.0240
(0.1188)
� 0.1040
(0.3086)
� 0.0860
(0.1650)
0.0970
(0.1561)
� 0.0328
(0.1218)
Ratio 0.0693
(0.0264)
0.0491
(0.0244)
0.0419
(0.0777)
0.0754
(0.0494)
0.1325
(0.0255)
0.1313
(0.0248)
Urb � 0.0070
(0.0107)
0.0095
(0.0084)
� 0.0187
(0.0207)
0.0006
(0.0137)
� 0.0075
(0.0103)
0.0039
(0.0085)
Educ � 0.0321
(0.0139)
� 0.0017
(0.0370)
� 0.0321
(0.0137)
Growth � 0.0550
(0.0308)
� 0.0522
(0.0475)
0.0130
(0.0297)
ln(pol) 0.1652
(0.1157)
0.0489
(0.2239)
0.1527
(0.1084)
Chr � 0.3651
(0.3982)
0.7565
(0.6645)
� 0.4150
(0.3835)_cons 4.7633
(1.5241)
1.8361
(0.8226)
2.9387
(2.7134)
1.7398
(1.0298)
4.3848
(1.4945)
0.9368
(0.8579)
No. of obs. 70 98 41 56 69 99
No. of countries 37 42 28 34 37 42
R2 0.21 0.08 0.14 0.06 0.37 0.24
Uncorrected cross-section coefficients
ln(gnp) 1.06 0.80 0.78 0.92 0.79 0.60
Ratio 0.01 0.04 � 0.08 0.04 0.12 0.16
Obs.: Numbers below the coefficients are standard errors. Data refer to averages for the periods 1975–1983,
1984–1988 and 1989–1994 (or last year available). Dependent variable is the log of the number of thefts,
burglaries or contact crimes as percentage of the total population, adjusted for reporting error. Independent
variables are ln of the GNP per capita; ratio between income or consumption per capita of the 20% richest and of
the 20% poorest; percentage of population living in urban areas; ln of the number of policemen as a percentage of
the population; gross primary enrollment rate; average growth rate of the GNP per capita in the period; and a
dummy indicating whether at least 60% of the population is Christian. The variables ‘ratio’ and ‘chr’ are constant
through time. Uncorrected cross-section coefficients are taken from Table 4.
R.R. Soares / Journal of Development Economics 73 (2004) 155–184176
crime, increases in the activity level should open better vacancies in the legal sector, and
move the marginal criminals from the illegal into the legal sector. Contact crimes are less
‘economic’ and, therefore, this effect should not be so strong.
The panel estimation also allows us to analyze whether these results come from
changes in these variables within a country over time or across countries. With this
purpose, we present in Table 7 the within and between estimates, each one for two
different specifications. As our inequality variable is constant through time for each
country, we know that its effects must come from the between variation, and this is what
Table 7 reports. In relation to growth, the opposite is true. The negative effect of growth on
crime rates comes mainly from within country changes. Increases in growth in a given
country tend to reduce both thefts and contact crimes, but systematic differences in growth
rates across countries do not seem to be associated with systematic differences in crime
Table 7
Between and within estimates for corrected data
Thefts Burglaries Contact crimes
1 2 1 2 1 2
Between
ln(gnp) 0.2430
(0.2289)
� 0.1194
(0.1679)
� 0.0651
(0.3773)
� 0.2487
(0.2163)
0.2423
(0.2104)
� 0.1104
(0.1620)
Ratio 0.0843
(0.0399)
0.0433
(0.0289)
0.0681
(0.0733)
0.0681
(0.0489)
0.1891
(0.0370)
0.1366
(0.0284)
Urb � 0.0208
(0.0149)
0.0021
(0.0116)
� 0.0284
(0.0226)
� 0.0101
(0.0156)
� 0.0192
(0.0138)
� 0.0009
(0.0114)
Educ 0.0074
(0.0241)
0.0180
(0.0466)
� 0.0208
(0.0224)
Growth 0.0132
(0.0622)
� 0.0498
(0.1069)
0.0591
(0.0577)
ln(pol) 0.4514
(0.2460)
0.1425
(0.5598)
0.2909
(0.2283)
Chr � 0.1858
(0.4643)
0.4827
(0.6765)
� 0.2488
(0.4309)
Within
ln(gnp) � 0.1667
(0.4849)
� 0.8834
(0.4581)
0.5131
(2.9227)
� 0.2018
(1.6899)
� 0.4487
(0.5145)
� 0.8460
(0.5778)
Urb � 0.0337
(0.0325)
0.0719
(0.0294)
0.2373
(0.3130)
0.2544
(0.1802)
0.0191
(0.0331)
0.0717
(0.0359)
Educ � 0.0521
(0.0153)
� 0.0363
(0.0129)
� 0.0340
(0.0538)
� 0.0131
(0.0379)
� 0.0249
(0.0170)
� 0.0195
(0.0169)
Growth � 0.1056
(0.0228)
� 0.0658
(0.0196)
� 0.0030
(0.0997)
� 0.0046
(0.0617)
� 0.0412
(0.0223)
� 0.0233
(0.0245)
ln(pol) 0.2709
(0.0811)
� 0.2209
(0.3373)
0.2220
(0.0683)
Obs.: Numbers below the coefficients are standard errors. Data refer to averages for the periods 1975–1983,
1984–1988 and 1989–1994 (or last year available). Dependent variable is the log of the number of thefts as
percentage of the total population, adjusted for reporting error. Independent variables are ln of the GNP per capita;
ratio between income or consumption per capita of the 20% richest and of the 20% poorest; percentage of
population living in urban areas; ln of the number of policemen as a percentage of the population; gross primary
enrollment rate; average growth rate of the GNP per capita in the period; and a dummy indicating whether at least
60% of the population is Christian. The variables ‘ratio’ and ‘chr’ are constant through time.
R.R. Soares / Journal of Development Economics 73 (2004) 155–184 177
rates. The same is true for education: its effect in reducing crime rates is associated
exclusively with within country variations.
This decomposition of between and within effects stresses the importance of analyzing
the time dimension of the crime phenomenon, since education and growth did not appear
to be significant in the cross-section regressions. The correction procedure suggested
earlier retains its relevance, and allows these relations to be uncovered, taking into
account the presence of the reporting error. This would be otherwise impossible, either
with the cross-section of victimization data or with the panel of official records.
Results similar to the ones reported here were obtained by Fanjzylber et al. (1998,
2000), for panel regressions with homicide data. This coincidence of results supports the
common view that homicide data is likely to be less contaminated by the reporting
R.R. Soares / Journal of Development Economics 73 (2004) 155–184178
problem. In this direction, it is also interesting to note that the consensus that is being
formed around the effect of inequality on crime across countries does not have a
counterpart on the within country studies (see, for example, Allen (1996); Kelly
(2000)). The incompatibility between these different studies remains an open puzzle.
Finally, the quantitative implications of the estimated model are also interesting. The
numbers in Table 6 imply that reducing inequality from the level of a country like
Colombia to levels comparable to Argentina, Australia, or United Kingdom, would reduce
thefts by 50%, and contact crimes by 85%. If a country increased its primary enrollment
rates in 10%, more or less equivalent to moving from Bolivian to American or Bulgarian
standards, both thefts and contact crimes would be reduced by approximately 30%.
Finally, 1% more growth would mean a 6% reduction in theft rates.
The meaningful use of the official panel, once taken into account the reporting problem,
seems to be possible and relevant. It sheds light on the different responses of the different
types of crimes, and on the effects of education and growth on crime rates. It also confirms
the importance of inequality in explaining the differences in crime rates across countries.
6. Concluding remarks
This paper contributes to the understanding of the heterogeneity in crime rates across
countries. It focuses on two aspects as possible causes for these differences: reporting rates
and economic development.
The explicit analysis of the behavior of the reporting rate is completely novel and
unprecedented in the economic literature on international comparisons of crime rates. The
results from Section 4 show that reporting rates tend to be strongly correlated with
development (income per capita), so that richer countries report a higher fraction of
committed crimes. The evidence from that section also shows that the results from
previous studies, which systematically found positive effects of development on crime, are
not accurate, precisely because of the correlation between reporting rates and develop-
ment. The idea that development is criminogenic is false, and is driven basically by the
correlation between development and reporting rates.
Despite this problem, data based on official records have an important feature that is
missing from the victimization surveys: they have enough observations to allow the
exploration of the time dimension of the crime phenomenon. With this in mind, we argue
that the use of the UNCS data is still potentially useful, and, in Section 5, we propose and
apply an econometric approach, based on information obtained from the victim survey,
that accounts for the presence of reporting error in the official records. The results from the
panel data analysis on the corrected data show that inequality tends to have a positive
effect on thefts and contact crimes, and it is the single factor most closely and consistently
related to crime. Development (income per capita), by itself, does not have any significant
effect on crime, although increases in the economy’s growth rate reduce the number of
thefts. Education is also a factor that has negative effects on numbers of thefts and contact
crimes.
This paper has two main contributions for the crime and development literature. First, it
explicitly studies the cross-country properties of the reporting error contained in police
R.R. Soares / Journal of Development Economics 73 (2004) 155–184 179
crime data, and the results in this direction are, at least, surprising. As mentioned before,
the magnitude of this problem was previously unknown. Second, the paper suggests a
way of using both official records and victimization data to extract as much information
as possible from available crime statistics. This may turn out to be a generally useful
methodology. Even for within country analysis, cases where a long panel of official
records and a cross-section or short panel of victimization data are simultaneously
available are not rare. Although the correlation between the economic variables and the
reporting rate is likely to be less severe for different regions of the same country, a
systematic investigation of the question in this case has also to be done, before the
results based on official records can be taken on face value. In our particular case, the
correction procedure turned out to be useful because it uncovered the relation between
growth, education, and crime, that could not be seen in the cross-section results. Also, it
confirmed the importance of inequality in determining the differences in crime rates
across countries.
Finally, two main results of our analysis—that were not settled in the previous
literature—seem to be beyond any dispute: the careless use of official data in international
studies may lead to grossly incorrect conclusions, and income inequality is an important
variable in explaining the differences in crime rates across countries.
7. Uncited reference
Hsiao, 1986
Acknowledgements
I would like to thank Steven Levitt and Pedro Carneiro for valuable suggestions and
discussions. I also benefited from important comments from Luıs Henrique Braido, Casey
Mulligan, Juan Santalo, two anonymous referees and seminar participants at the University
of Chicago and the V Annual Meeting of the Latin American and Caribbean Economic
Association (LACEA Rio 2000). I thank Anna del Frate and the United Nations
Interregional Crime and Justice Research Institute for the access to the International Crime
Victim Survey. Financial support from the Conselho Nacional de Pesquisa e Desenvolvi-
mento Tecnologico (CNPq)-Brazil is gratefully acknowledged. The usual disclaimer
applies.
Appendix A. Estimation and standard error correction
The problem proposed in Section 5 is one of measurement error, with the error being
correlated with the independent variables. In this case, the situation can be described as
follows:
Real crime data (unobservable): Y*=Xh + e, with Cov(X,e) = 0;Official crime data (observable): Y= Y*+ v, with Cov(X,v) p 0.
R.R. Soares / Journal of Development Economics 73 (2004) 155–184180
As usual, the OLS estimator of h based on the official data will be biased. To see this,
regress Y on X to obtain:
h ¼ ðXVX Þ�1XVY ¼ ðXVX Þ�1
XVY*þ ðXVX Þ�1XVm
¼ h þ ðXVX Þ�1XVe þ ðXVX Þ�1
XVm; with EðhAX Þ ¼ h þ ðXVX Þ�1XVEðmAX Þ p h:
Consider explicitly the structure of the official data. We have a panel composed of T
periods, such that we can naturally divide X and u accordingly:
X ¼
X0
]
Xt
]
XT
26666666666664
37777777777775
and u ¼
u0
]
ut
]
uT
26666666666664
37777777777775
:
The availability of the cross-section of victimization data, together with the assump-
tions made in the text, corresponds to assuming that we actually observe one cross-section
of the error term vt.
We assume that v and X are jointly normally distributed, such that one can write
v =Xc + u, Cov(X,u) = 0, and the standard regression model applies for the relation between
v and X: c=(XVX)� 1XVv = c+(XVX)� 1XVu, with E(cAX) = c. The error terms e and u are
assumed to be uncorrelated white noises. Finally, suppose that the joint distribution of X
and u is time invariant, such that Cov(Xt,ut) = 0 for every t.
With vt, we can estimate c via ct=(XtVXt)� 1XtVvt= c+(XtVXt)
� 1XtVut. Note that ct is an
unbiased estimator of c: E(cAXt) = c+(XtVXt)� 1XtVE(utAXt) = c.
Substitute the different equations into only one to obtain Y= Y* + v = Y* +Xc + u =Xh +Xc + u + e. Note that, if we knew c, the classical regression model would apply to
Y�Xc =Xh + u + e. As we do not know c, we can try to use its estimated value.
Define ˆh as the coefficient of the regression of Y= Y�Xct on X. We have that
ˆh ¼ ðXVX Þ�1XVðY � X ctÞ ¼ ðXVX Þ�1
XVY � ct
¼ h þ ðXVX Þ�1XVe þ ðXVX Þ�1
XVv� ct
¼ h þ ðXVX Þ�1XVðuþ eÞ � ðXtVXtÞ�1
XtVut; with
Eð ˆhAX Þ ¼ h þ ðXVX Þ�1XVEðuþ eAX Þ � ðXtVXtÞ�1
XtVEðutAX Þ ¼ h:
R.R. Soares / Journal of Development Economics 73 (2004) 155–184 181
Therefore, ˆh is an unbiased estimator of h. But standard errors usually estimated by OLS
procedures will generally be biased, because of the presence of the estimated ct. Define re2
and ru2 as, respectively, the variances of e and u, and n as the total number of observations.
The covariance matrix of ˆh usually estimated by an OLS procedure would be
V ð ˆhÞ ¼ e Ve
n� kðXVX Þ�1; where k is the number of regressors and
e ¼ Y � X ct � X ˆh
¼ Xh þ ðuþ eÞ � X ðXtVXtÞ�1XtVut � Xh � X ðXVX Þ�1
X Vðuþ eÞ
þ X ðXtVXtÞ�1XtVut
¼ ðI � X ðXVX Þ�1XVÞðuþ eÞ:
With u and e uncorrelated, the usual calculations lead to E½V ð ˆhÞAX � ¼ ðr2e þ r2
uÞðXVX Þ�1.
But from the expression above for ˆh,
V ð ˆhAX Þ ¼ ðr2e þ r2
uÞðXVX Þ�1 þ r2uðXtVXtÞ�1 � 2ðXtVXtÞ�1
XtVCovðut; uÞX ðXVX Þ�1
¼ ðr2e þr2
uÞðXVX Þ�1 þ r2uðXtVXtÞ�1 � 2r2
uðXtVXtÞ�1XtVHtX ðXVX Þ�1; with
Ht ¼ 0nt�n1
. . . 0nt�nt�1
Int�nt
0nt�ntþ1
. . . 0nt�nT
� �;
where nt denotes the number of observations available for period t (with a balanced panel,
nt = n/T for every t). Note that Ht is the operator that extracts Xt from X, so that HtX =Xt.
So we can write
V ð ˆhAX Þ ¼ ðr2e þ r2
uÞðXVX Þ�1 þ r2uðXtVXtÞ�1 � 2r2
uðXVX Þ�1
¼ ðr2e � r2
uÞðXVX Þ�1 þ r2uðXtVXtÞ�1:
Therefore, the usual V ð ˆhÞ is a biased estimator of V ð ˆhÞ. But we can construct an
unbiased estimator of V ð ˆhÞ with the information available. Define ut as the vector of
estimated errors from the regression of vt on Xt: ut = vt�Xct. This regression respects all
traditional assumption, so that su2 = (utVut/n� k) is an unbiased estimator of ru
2: E((utVut/n� k)AX) = ru
2. Define se2 = (eVe/n� k)� su
2. Note that E(sq2 |X) =E((eVe/n� k)AX)�E(su
2AX) = re
2 + ru2� ru
2 = re2.
R.R. Soares / Journal of Development Economics 73 (2004) 155–184182