Testing the Mincer Model Hypotheses for Brazil ∗ Rodrigo Leandro de Moura EPGE/FGV-RJ † Abstract Many estimates of rates of return for education have been produced, based on the Mincer’s model. But some of the hyphoteses (linearity and separability), so that the ("mincer") school coefficient is interpreted as rate of return, are tested and rejected. When relaxing such hypothe- ses, we estimate the internal rates of return (Becker, 1975) and we get biases that arrive to 14 percentile points in relation to the “mincer coefficient”. Thus, the magnitude of these returns is much lower than the papers based on Mincer’s model.In the estimates we incorporate the sample design of PNAD and correct the problem of bias of sample selection. Key-words:returns to schooling, sample selection, sample design, local non-parametric linear regression. JEL Code: I20, J24, C14, C42 1 Introduction The decision to accumulate human capital (education, on-the-job training, health, etc.) depends upon the correct assessment of the returns. One way of measuring the returns is through the use of the internal rate of return (IRR), a central concept in the theory of human capital, developed in the analysis of the individual’s decision to invest in human capital. This measure was developed in Becker (1975, hereinafter referred to as Becker) and Shultz (1963). In their analysis, the individual invests in human capital through a comparison of the flows of benefits and costs, from which he takes ∗ Article accepted for publication in the Brazilian Economic Review (Revista Brasileira de Economia ), 62(1):1-47, 2008. I wish to thank Carlos Eugênio da Costa, of the EPGE/FGV, for various comments and guidance in the preparation of this article; Petra Todd, of the University of Pennsylvania, for making available the R codes that were important for some of the tests in this article; Luis Henrique Braido and Luis Renato Lima, of the EPGE/FGV, and all of the other participants in the Research Seminars at the EPGE for their various suggestions and criticisms, Breno Néri, a doctoral candidate at New York University for his help with a routine; Elaine Toldo Pazello, of the FEA-RP/USP, for her suggestions; Maurício Lila, Djalma Pessoa and Pedro Nascimento Silva, of the IBGE, for their help regarding PNAD and the Census and all of the participants of the XXVIII Meeting of the Brazilian Econometric Society (SBE), where a preliminary version of this paper was presented. Finally, I wish to express my appreciation for the comments of an anonymous referee. Any remaining errors are the sole responsibility of the author. † Ph.D. in economics from the EPGE/FGV-RJ. Lecturer of economics at FGV-RJ. E-mail: rodrigolean- [email protected] or [email protected]. 1
42
Embed
Testing the Mincer Model Hypotheses for Brazil∗ - FGV EPGE
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Testing the Mincer Model Hypotheses for Brazil∗
Rodrigo Leandro de Moura
EPGE/FGV-RJ†
Abstract
Many estimates of rates of return for education have been produced, based on the Mincer’s
model. But some of the hyphoteses (linearity and separability), so that the ("mincer") school
coefficient is interpreted as rate of return, are tested and rejected. When relaxing such hypothe-
ses, we estimate the internal rates of return (Becker, 1975) and we get biases that arrive to 14
percentile points in relation to the “mincer coefficient”. Thus, the magnitude of these returns
is much lower than the papers based on Mincer’s model.In the estimates we incorporate the
sample design of PNAD and correct the problem of bias of sample selection.
Key-words:returns to schooling, sample selection, sample design, local non-parametric linear
regression.
JEL Code: I20, J24, C14, C42
1 Introduction
The decision to accumulate human capital (education, on-the-job training, health, etc.) depends
upon the correct assessment of the returns. One way of measuring the returns is through the use
of the internal rate of return (IRR), a central concept in the theory of human capital, developed in
the analysis of the individual’s decision to invest in human capital. This measure was developed in
Becker (1975, hereinafter referred to as Becker) and Shultz (1963). In their analysis, the individual
invests in human capital through a comparison of the flows of benefits and costs, from which he takes
∗Article accepted for publication in the Brazilian Economic Review (Revista Brasileira de Economia), 62(1):1-47,
2008. I wish to thank Carlos Eugênio da Costa, of the EPGE/FGV, for various comments and guidance in the
preparation of this article; Petra Todd, of the University of Pennsylvania, for making available the R codes that were
important for some of the tests in this article; Luis Henrique Braido and Luis Renato Lima, of the EPGE/FGV,
and all of the other participants in the Research Seminars at the EPGE for their various suggestions and criticisms,
Breno Néri, a doctoral candidate at New York University for his help with a routine; Elaine Toldo Pazello, of the
FEA-RP/USP, for her suggestions; Maurício Lila, Djalma Pessoa and Pedro Nascimento Silva, of the IBGE, for their
help regarding PNAD and the Census and all of the participants of the XXVIII Meeting of the Brazilian Econometric
Society (SBE), where a preliminary version of this paper was presented. Finally, I wish to express my appreciation
for the comments of an anonymous referee. Any remaining errors are the sole responsibility of the author.†Ph.D. in economics from the EPGE/FGV-RJ. Lecturer of economics at FGV-RJ. E-mail: rodrigolean-
a discount rate that makes them equal. Becker shows that a risk-neutral agent who is maximizing
his wealth will tend to concentrate his investments at an early age, because: (i) with the passage of
time the individual has a shorter period of time to recover the return on his investment in human
capital and (ii) the opportunity costs increase with the increase in the level of human capital. In this
paper we will focus on only one component of human capital: education. Individuals with higher
levels of education tend to have higher incomes. This is logical, since with a greater accumulation
of education there is a tendency to improve skills, knowledge and health, all factors that increase
worker productivity. And this last tends to equalize earnings in a perfectly competitive market1.
Therefore the accurate calculation of the IRR depends on the estimates of the individuals’s earnings
profile over their life cycle.
Nevertheless, in the empirical literature, various estimates of rates of return have been re-
ported, based on the seminal models of Mincer (1958 and 1974, hereinafter Mincer I and Mincer
II, respectively) that derive the following salary equation: :
lnY (s, x) = α+ βs+ γx+ δx2, (1)
where, Y (s, x) represents income adjusted for hours of work, s represents years of study and x rep-
resents experience. This coefficient β is known as the mincerian coefficient (or return) on education.
Heckman (2005) points out that in the United States, there are several apparent empirical puzzles,
such as: high mincerian returns to education vis-à-vis other investments; and given this, a slow
response in enrollment by recent cohorts of individuals. But according to Heckman, Lochner and
Todd2 (2006, hereinafter HLT), few of these estimates represent true rates of return. Many of the
assumptions of the Mincer model that turn the mincerian coefficient into an internal rate of return
(TIR) are valid only under very restrictive conditions. Thus, the study of returns to education
must invariably start with Mincer’s original models where these assumptions must be tested.
In Brazil various studies consider the mincerian coefficient as a rate of return, but none of
these studies performed any tests on the assumptions. Meanwhile, various studies have already
performed tests of linearity (Hungerford and Solon, 1987; Jaeger and Page, 1996; and Heckman et
al., 1996b), and more recently parallelism (HLT) in the United States, rejecting these assumptions,
which are crucial for the interpretation of the coefficient as a return to education. The IRRs
can therefore be seen as an opportunity cost for investing in education in comparison with other
alternatives. And, contrary to the “mincerian” returns, the IRR takes into account costs (direct
and indirect). Therefore, only when the assumptions of the Mincer model are satisfied, and under
some additional restrictions, can we say that the IRR is equal to the mincerian coefficient. Some
of these assumptions are: that the working lifetime is the same for all individuals independent
of their educational level; during the period of education individuals do not work; the only costs
incurred are the opportunity costs, in other words, the income from the labor market foregone
1Obviously there are other (non-monetary) benefits derived from the learning process, but according to Becker,
the results point to the secondary importance of these other benefits.2An earlier version of this paper was circulated under the title: Fifty Years of Mincer Earnings Regressions.
2
during the period of schooling; that uncertainty does not exist; agents are risk neutral; there are no
imperfections in the credit market; linearity in schooling and separability between schooling and
experience (parallelism). Therefore, based on similar data and the same structure of the Mincer
model, we tested for the last two assumptions and rejected both hypotheses. In this case, then,
the mincerian return might be better understood as a rate of growth in market wages due to the
increase at the margin in years of study, or even as the marginal cost of education.
Then we estimated the IRRs, using the mincerian coefficients as references, and we showed that
the bias is relatively high when we relax the assumptions of linearity and parallelism. We obtained
biases that reached to a little more than 14 percentage points, when comparing the mincerian
returns (17.29%) to the IRR (3.03%) for the Masters and Ph.D. degrees when compared with an
undergraduate degree. Various studies in Brazil do not take into account these assumptions and,
consequently, their estimates are inaccurate and the degree of this discrepancy is relatively large,
which could lead to distorted or poorly understood conclusions. In addition, we relaxed other of
the above-mentioned assumptions as well.
We used two econometric techniques: parametric and nonparametric regressions. The test for
linearity used the first and the test for parallelism used the second. The calculation of the IRRs
involves the use of both instruments. In all of the estimates, we used PNAD (Brazilian Household
Sample Survey) (This is from the IBGE website) and Census data. With regard to the PNAD data,
we could not assume a random independent and identically distributed sample because the survey
sample is very complex. Therefore we incorporated the sample design of the PNAD3, which could
be considered a positive addition to the empirical literature for Brazil4. In addition, we corrected
the problem of bias in the sample selection that occurs because some individuals choose not to work
because the market wage is lower than his reserve wage. This modification could also be considered
a positive addition, given that even recent studies, such as HLT, do not incorporate this feature,
which changes the magnitude of the IRRs, Thus, we compare estimates made without correction,
incorporating the sample design, and estimated by the two-stage estimation procedure developed
by Heckman (1979)5.
This paper follows the following structure: section 2 presents a selective review of the literature;
section 3 shows, through a simple model, the relationship between the IRR and the mincerian
coefficient; section 4 presents the methodology and results for the linearity and parallelism tests;
section 5 describes the methodology, results and discussion of the IRRs and section 6 offers some
conclusions.3This point will be discussed in greater detail in section 4.2.4With regard to the census, the incorporation of the sample design was not possible. The variables that permit
the incorporation of the samples on the census are considered “classified data” by the IBGE and for this reason they
are not revealed.5The correction of the sample selection bias was done only for the parametric models. For the non-parametric
models it was not possible, because of the great complexity and size of the procedure. This adjustment was proposed
by Das, Newey and Vella (2003) and could be incorporated in future research.
3
2 Review of the Literature
The selective review of the literature presents international and national evidence with regard to
articles where the Mincer and the concept of the IRR are applied.
International Evidence — Mincer In a recent review, Card (1999) points out that studies
that relate education and earnings are almost always heavily based on the Mincer models. But
the functional form of Mincer’s model has raised various issues. Card has already shown that
one way of estimating earnings could be through the use of nonparametric techniques, such as a
general function of years of education and age. In the same vein as the parametric estimations,
Murphy and Welch (1990) show that a linear term of the years of education and a third or fourth
order polynomial for experience leads to a significant improvement in adjustment. An important
factor highlighted by these two studies is that the parametric model has problems in adjusting the
precise form of the profiles of earnings-age (experience) for US data because it tends to skew the
estimated rate of growth in worker earnings with a given level of education, in relation to the value
of the sample. This occurs because of the lack of specification of the model. Another problem that
emerges from these models is a reduction in the parsimony when cubed and fourth power terms
are added to the specification, leading to a problem of increased multicollinearity in the estimates.
These problems can be overcome using nonparametric estimation techniques.
Meanwhile, Card emphasizes the high degree of explanatory power of the mincerian model.
According to Park (1994), in the United States, the linear term for education is well suited to the
data. But there is evidence contrary to the linear model pointed out by Hungerford and Solon
(1987), Belman and Heywood (1991), Jaeger and Page (1996) and Heckman (1996), that make
estimates based on the mincerian model with such additional non-linearity components as, for
example, dummy variables for the years when the course of study was concluded to capture the
“sheepskin effect”6. An F test for the nonlinear terms strongly rejects the linear model.
Psacharopoulos (2004, 1994, 1985) revises the estimates for the rate of return based on the min-
cerian model for various countries, obtaining rewards for education in the Latin America/Caribbean
and sub-Saharan Africa areas — low and medium income countries — that are higher than the average
return worldwide. In addition, during the last 12 years the average mincerian returns worldwide
have declined by 0.6%, while the average level of education has increased7.
Brazilian experience - Mincer With regard to Brazil, various studies have estimated and
considered the mincerian coefficient to be the rate of return. Most of these studies relax the
6The “sheepskin effect” captures the effects of greater returns because of having obtained a degree. This “sheep-
skin” can be interpreted as signal of productivity for which the market contracts the worker.7The reference period depends on the country, varying from 1970 in Morocco to 1998 for Singapore and the
Philippines. Some of these estimates were obtained by the author from other articles. Psacharopoulos (2004) points
out that these comparisons are not exact due to differences in methodology and in the sample size.
4
assumption of linearity, estimating a spline function for years of study, as for example in Leal and
Werlang (1991). Blom et al. (2001) is a recent study that estimates current mincerian returns
for Brazil, using the Mincer II model, but relaxing the linearity assumption by using a spline
function with knots for the years when the cycles are concluded. The authors, using regressions
of the average and quantile conditions, find that there is a wide dispersion of the returns among
the different quantiles at all levels, with the exception of the third level. They suggest that this
dispersion could be due to uncontrolled factors (quality of the school, social capital and unobserved
abilities) related to the returns and that his estimated model will permit the interaction between
education and other terms such as experience, which would relax the parallelism assumption.
Sachsida, Loureiro and Mendonça (2004), using the PNAD for 1996 and various accumulated
years (1992 - 1999), calculate the mincerian return using Mincer II, but correcting for some sources
of bias, such as: sample selection, endogeniety of the education variable and the unobserved ability
of the individual. Soares and Gonzaga (1999), using the 1988 PNAD, test for the existence of duality
in the Brazilian labor market, and a sense of the existence of different salary structures associated
with “good” occupations (linked, for example, to greater returns to education, among other factors)
and “poor” occupations. These two studies, as well as that of Loureiro and Carneiro (2001) are some
of the few that relax the linearity and parallelism assumptions, but apply a functional form to the
equation. The results of these studies point to the significance of the term for interaction between
time in the labor force and years of education, therefore rejecting the assumption of parallelism.
Other studies where the principal objective is not to estimate rates of return, use the Mincer
model in their analyses. Fernandes and Menezes (2000) examine the evolution of inequality of
earnings from work, utilizing a mincerian regression (Mincer II) to which some controls are added,
and relax the assumption of linearity for the years of study. The mincerian return on education, in
this study, is an important explanatory factor in the reduction of inequality, principally between
1990 and 19918.
International Evidence - IRR Focused specifically on the IRR, Becker resolves one important
question relating to earnings-costs-rates of return: the difficulty in isolating the effect of the earnings
8Given the enormous range of studies on returns to education based on the mincerian model, it was not possible
to describe them in detail. But other recent studies that use the Mincer model should be mentioned, such as:
(i) Resende and Wyllie (2006) and Loureiro and Carneiro (2001) that correct for the problem of sample selection
bias by using a linear function for education, where the first controls for quality of education using the PPV between
1996 and 1997, and the second uses PNAD for 1998, and concludes that there are salary differences between rural
and urban laborers and discrimination by race and gender.
(ii) Ueda and Hoffman (2002) estimate the returns to education using least-squares and instrumental variables and
using linear and nonlinear specifications, including socioeconomic variables. They use the PNAD for 1996.
(iii) Silva and Kassouf (2000) examine the degree of segmentation and the labor market, correcting also for the
problem of sample selection, but using a multinomial logit model for the estimation of the selection equation, in order
to differentiate the unemployed and employed workers in the formal and informal sectors. They use the PNAD for
1995.
5
derived from a change in returns or a change in the amount invested in education. In the context of
a static model, in which investment is confined to a single time period, and the returns to all of the
remaining periods, Becker argues that the cost and the rate of return are easily determined taking
only the income stream. To accomplish this, he compares the flow of earnings from two different
levels of education, one with investment in the first period and the other that does not require a
lump sum investment. The cost of investment in education would be the net income forgone. This
is the context we use in our model for estimating the IRR9.
Schultz (1964) had already pointed out that the costs should be considered in the analysis of
investment in education. These costs extend beyond the monthly tuition expenses, annuity fees and
others, where the salaries forgone make up a significant part of the cost. In the context of economy
of the whole, the costs incurred by the schools (maintenance of infrastructure, depreciation and
services) are relevant, while in the context of the individual decision, the direct and indirect costs
of the student are more important. Of these latter costs, Schultz emphasizes the cost of the time
the students spent in school, where they are estimated by using the income foregone while students
are in school. Since there does not exist a perfect equivalent from which we can extract the income
stream for the case of the individual who attends, or does not attend, school, we have to take as a
reference point individuals with similar characteristics, but who are in the workforce.
Psacharopoulos (2004), also estimates the IRR, both private and social, for various countries,
and points out that the Latin America/Caribbean and sub-Saharan Africa regions are the ones
with the highest returns for all levels of education. Psacharopoulos (1994) points out that the
IRR method is the most appropriate but also says that this methodology has been replaced by the
Mincer II methodology because of the lack of a base with a sufficiently large number of observations
for a given cell for age-educational level for the construction of well-behaved earnings-age profiles
(concave and non-intersecting). But these arguments have been weakened given the wide range of
bases that currently exist.
Brazilian Evidence - IRR With regard to Brazil, Langoni (1974) was one of the first to estimate
the IRR for Brazil, based on Schultz (1964) and Becker. He calculated the direct costs (the cost
of infrastructure of the school and its depreciation, teacher salaries and student expenses) and
indirect costs (income foregone by students because they have left the labor market, and capital of
the school, measured by the interest sacrificed by the teaching institution). Note that the author
included costs of the school, which is of interest for the social rate of return, but not for the private
rate of return, which does not include these components10. To calculate the IRR, it is necessary
to measure income profiles by experience (age), Langoni does this by using sample means and not
9Future research could include dynamic analysis. The problem here is the lack of a database in Brazil that
accompanies individual through his time in school and, at least, for a part of his working life and would therefore be
able to measure the investment decision by the agent.10Langoni (1974) apparently does not incorporate taxes on the side of benefits, which would be more correct in
estimating the social rate of return.
6
through regression11, using cross-sectional data. The IRRs for 1960 and 1969 vary from 48.1% to
32% for illiterates; from 23.8% to 19.5% for middle school as opposed to primary school; and from
14.8% to 21.3% for high school as opposed to middle school; and from 4.9% to 12.1% for college
compared to high school. Barbosa and Pessoa (2006) update Langoni’s work using his methodology.
We show the equivalence between the IRR and the mincerian coefficient below.
3 The equivalence between the IRR and the "mincerian coeffi-
cient"
Mincer I and Mincer II arrive at the same model, but for different reasons. Mincer I uses the
principle of compensating differentials to explain why individuals with different levels of education
receive different income streams over their life cycles. Mincer II assumes that these individuals
can invest in human capital after education (through, for example, on-the-job training) to acquire
or improve their skills, expand their information set regarding their occupation and increase their
potential income. Mincer II obtains the same exact equation (1), while Mincer I has the same
functional form, but without the term for experience.
Therefore, in this section we show that, under certain assumptions, the mincerian coefficient
β for the equation (1) is equal to the IRR for education. Let Y (x, s)equal the annual income
for an individual with x years of work experience and s years of education, and l equal the total
time working. The direct and indirect assumptions in the Mincer model to demonstrate this
equivalency12 are: (i) a neutral risk agent that maximizes the present value of expected income
over the lifecycle, (ii) where l′(s) = 1 (in other words where the time in the workforce is equal for
all individuals independent of their level of education), (iii) that the only costs incurred are the
opportunity costs, in other words, the deferred income from the labor market during the period
of schooling, (iv) that uncertainty does not exist, (v) that the individuals enter the labor force
one period after concluding their studies, (vi) that the individuals do not work while they are
studying, (vii) that there are no imperfections in the credit market, (viii) that after obtaining a job
individuals do not return to school, that the functional form for income will be (ix) the (log) linear
for education and (x) (in level) are multiplicatively separable between education and experience.
This last assumption does not permit interaction between education and experience and maybe
better visualized by rewriting the equation (1) to obtain the following production function for
11The problem with sample means is that it is less efficient, in terms of greater variance, than the estimator of
minimum least squares. Thus, the graph of the income-experience (age) profile, by level of education, according
to the regressions tests to be extremely smooth, while the sample means does not show any smoothness, and may
present spurious relationships between the variables. We can overcome this problem using nonparametric regressions
that do not impose a functional form that regresses one variable against another, which imposes a causal relationship
between the endogenous and exogenous variables, while the sample means do not.12This equivalency is shown also in HLT and Willis(1986), which they derive for the case of continuous time.
7
human capital::
Y (x, s) = λ(s)θ(x),
where, λ(s) = λ(0)eβs, and θ(x) = eγx+δx2. Thus, we have ∂ lnY (x,s)
∂s∂x = 0, in other words, in-
come log, which is parallel to experience among the various levels of education (the assumption of
parallelism)..
Willis (1986) also points out an additional assumption for the economy as a whole: that the
economy and the population who are in (long-run) steady-state equilibrium, with no changes for
aggregate productivity, and a constant rate of growth in population, such that the present value of
income for the lifecycle is one for a representative individual. Thus, the individual maximizes the
present value of his income flow by choosing the discrete quantity of years of education:
max{s}s0
l∑
x=0
Y (x, s)
(1 + r)s+x.
assuming parallelism and linearity during the years of study, i.e., Y (x, s) = λ(0)eβsθ(x), θ(x) <
∞, where we have as a first-order condition:
l∑
x=0
Y (x, s+ 1)
(1 + r)1+x−
l∑
x=0
Y (x, s)
(1 + r)x= 0,
[λ(0)eβ(s+1)
1 + r− λ(0)eβs
]l∑
x=0
θ(x)
(1 + r)x= 0.
Forr �= (−1, 0) we have:
λ(0)eβ(s+1)
1 + r− λ(0)eβs = 0,
eβs − 1 = r.
Thus, for equivalency to be valid, the principal assumptions are linearity and parallelism. These
assumptions will be tested. Anticipating the results, we see that they are rejected, which is reason
for their relaxation, as well as assumptions (ii), (iii) e (vi) in addition.
4 Testing The Mincer Hypotheses
In the following subsection we discuss some of the limitations to our approach of using cross-
sectional data. In the next subsection would present the data and some preliminary statistics.
Then we show the methodology for testing the linearity and its results and in the last part of the
section we present the methodology for testing parallelism and its results.
8
4.1 Discussion
As pointed out by HLT, the use of cross-sectional data leads us to an assumption that could be
relatively strong. But individuals base their decisions on investment on ex ante analysis, on the
experience—earnings profile of older individuals of active working age. This is one version of the
assumption of rational expectations in which individuals forecast their expected income based on
the earnings profile of older individuals (Heckman, 2005). Thus, it does not take into consideration
that these individuals might anticipate future changes in the price of education, for example. It
should be pointed out, however, that the fact that individuals based their human capital investment
decisions on the experience of older individuals is valid because there is no perfect counterfactual
from which can be extracted the income flow for the case of the individuals who attend and do not
attend school. Thus, we must take individuals with similar characteristics as a reference.
In addition, the use of cross-sectional data, according to Card (1999), is valid if it reflects
differences in real productivity that are not due to differences in the inherent ability of the individual
or that might be correlated with education through the differences in income flows. This problem,
endogeniety, has been intensively examined in the literature; and recently, Sachsida, Loureiro and
Mendonça (2004) have estimated mincerian returns, correcting for various sources of bias in Brazil.
It should be pointed out that the bias caused by ability and other factors that are not present does
not exceed 10% of the value of the mincerian coefficient for the US (Card, 2001). For Brazil, Binelli
et al. (2006) show that this bias, which originates in non-observed heterogeneity, is relatively small
for the respective returns.
4.2 Data and Descriptive Statistics
PNAD, because it deals with a “complex” sample survey, needs special care. For that reason, we
will briefly discuss the literature that explains and incorporates the sample design of the survey
and warn of the consequences of the failure to take this into consideration.
Sample Design The costs of conducting a sample survey based on a simple sample design are
very high. For this reason, according to Chromy and Abeyasekera (2005), complex sample de-
signs are used to control these costs. According to Yansaneh (2005), a complex sample design
involves stratification, multistage sampling (conglomeration or cluster) and different probabilities
of selection. With regard to conglomeration, the selected observations in the first stage generally
are called the primary sampling unit (PSU). The PSUs can be divided into urban and rural areas,
and in some countries, they are divided by geographic or administrative areas. The observations
selected for each PSU are called second stage units (SSU), and within these groups a third stage
(TSU), and so on, successively. Generally speaking, the second stage units are households or fam-
ilies, and the third stage are the individuals. Stratification is generally applied to a stage of the
sample, in which the units are partitioned (into first, second, and third stages) into subgroups that
are mutually exclusive. These units are generally selected with probabilities proportional to their
9
size (for example, the number of families or individuals belonging to a PSU), and therefore may
be unequal in each stage. Thus, according to Pessoa and Silva (1998) and the IBGE (2004), the
PNAD complex sample design uses a sample stratified by State (FU), and smaller regions within
the States. The selection of municipalities within each stratum is done with unequal probabilities,
proportional to size, where there are municipalities included in the sample with a probability equal
to 1 (called auto-representative municipalities). The second stage units are census sectors and, sim-
ilarly a selection of these sectors within each municipality is done with probabilities proportional
to the number of households in each sector according to the most recent Census data available. In
the final stage households are selected in each of these sectors, with equal probabilities. All of the
individuals living in each household of the sample are surveyed.
However the studies in general do not take account of these factors, starting from basic assump-
tions that could only be valid when all of the data are obtained through simple random sampling
with replacement or, similarly, independence and equal distribution (iid). In general, the data
obtained in surveys by sample, such as the PNAD, do not permit these assumptions to be used
(Silva, Pessoa e Lila, 2002).
Various economic studies do not consider the complex sample design when estimating variance,
in the construction of confidence intervals and tests for assumptions, generating, according to Lum-
ley (2004), skewed estimates, which a rigor wind up invalidating the usual tests of the assumptions.
Thus, their results are inaccurate, and can result in a change that is merely quantitative and may
be qualitative, when changing the (non-) significance of the estimated parameters. Therefore, this
study also makes a contribution to this question, by incorporating the sample design of the PNAD.
Sample and Descriptive Statistics Therefore, in all of the tests we performed we use the data
from PNAD from 1992 through 200413, and from the Census data for 1970, 1980, 1991 and 2000.
Thus, we were able to perform the tests of linearity without having to correct the estimates and
comparisons when incorporating the PNAD sample design. The same procedure was used in the
calculation of the IRRs that involve nonparametric and parametric specifications.
The subpopulation used in the linearity test was: white male individuals between 24 and 56
years of age, who were not in school, with a work week of more than 36 hours and less than 44
hours, with positive income14 equal to less than 100 times the minimum wage in reais, excluding
workers in the public and agricultural sectors and workers producing for their own consumption,
construction for their own use and unpaid labor.
The exclusion of agricultural workers and public employees is to the fact that the salary structure
is different from the market. The removal of those attending school is for the purpose of comparison
with the mincerian model which assumes that the individual enters the labor market one period
13Except for the years 1994 and 2000, when the survey was not held. In addition, in 2004, the IBGE included rural
areas in the North, which previously had not been incorporated. So, for the purposes of comparison with other years,
we did not use the data from the rural area in the North in 2004.14When we corrected the bias in the sample selection, we included those who were not working.
10
after completing school.
In addition, we see that in the fifth column of tables 1 — 2 in the appendix that only around
10% of the workers study, but this percentage has been increasing over the years. Among those
who work and study, the majority are men and among the men they are white. But these groups
have declined relatively in recent years. The age limits can be seen in the ninth and 12th columns
of the tables. The large majority of workers are 24 or more years of age and this has increased
during the last two decades at the cost of other age groups. In addition to being the largest group,
this age group has an average income of around R$1000 in real terms15, approximately 100% and
400% more than the second and fourth age groups, respectively. Thus, the non-inclusion of the
smaller groups in the tests at in the calculation of the IRRs includes a relatively small part of the
opportunity costs of the income foregone16. With regard to the limit of 56 years, observe that
within the universe of retirees, the vast majority are more than 56 years old. The average age of
this group is around 63- 66 years. However, this is a measurement that overestimates the actual
age of workers when they retire, which is due to the lack of a variable for correctly measuring this
point.
The restrictions on hours for full-time work is due to the fact that, according to Freeman (1986),
the human capital investment model proposed by Becker states that the individual should decide
if he will attend school and invest in education or seek full-time work in the labor force every year.
Other major studies, such as Murphy and Welch (1992, 1990), that are based on the mincerian
model to estimate the income profile of individuals, also limit the sample to full-time workers.
With regard to the exclusion of women, we can cite two reasons here: (i) women enter the
labor market later, at average of between 14 and 15 years, while men enter the labor force one
year earlier; (ii) Cameron and Heckman (2001, apud Sachsida, Loureiro and Mendonça, 2004), in a
study of the sources of ethnic and racial disparity in school enrollments, consider only men, because
their decisions regarding education are less complicated by conditions of fertility.
4.3 Test for Linearity
In the test for linearity, three different specifications were used. In general form, we estimated:
lnY = α+ β1x+ β2x2 + β3s+ specificationk + e , k = 1, 2, 3, (2)
where, Y is the hourly wage17. In the specification 1, we used a spline function:
specification1 =∑15j=1 βjSj , (3)
15 Income in this article was adjusted by the INPC (Consumer Price National Index) to prices for November 2004.16This point is discussed again in section 5.2 where we discuss the assumptions used in the calculation of the IRRs.17More precisely: Y =(income from principal employment) / (number of hours worked x 4) . Note that for the
Census of 1980/1970 the number of hours worked are available only by groups. Thus the values assumed follow the
table below. With the exception of the upper and lower bounds, the values refer to the average for each hourly group.
11
where, Sj, j = 2, ..., 15, is a dummy and the individual has S � j years of education. These
dummies capture the returns to education, permitting discontinuities and changes in inclination
after each year of study completed. In specification 2, we use a cubic function:
where, but we also include some variables that capture discontinuities18, a definition similar to
equation (3). Finally, we estimate the specification 3 more broadly, which permits us to obtain
estimates of each grade completed. Thus, we do not use more years of education, rather whether the
individual completed a given grade. We also opted for the specification, given that some individuals
receive diplomas (Middle School, High School, etc.) 19 with more or fewer years of study than the
majority of students. For this, we replace S in equation (3) withe variable degree20. E assim a
especificação 3 é descrita como:
specification3 =∑4j=1 β3+jEFj+
∑8j=5 β3+jEFj+
∑3j=1 β11+jEMj+
∑4j=1 β14+jSUPj+β19MD.
(5)
The variables EFj, EMj , SUPj and MD are dummies if the individual has Degree � j. This
logic has shown the same technical features as in specification 121.
Thus, we did a F test on the coefficients of the specifications described above in order to test
the null hypothesis that favors the linear model, against the alternative hypothesis that is more
favorable to nonlinearity in the returns.
It is important to note here the differences between the specification of the years of schooling
(specification of equation 5) and the specification of the years of education (specification of
equation 3 and 4). The years of education variable, in the PNAD, is derived from the grade level
variable (i.e., years of schooling variable). Therefore, someone who studied through the fourth
Because of the division by groups, the sample for 1980/1970 includes only those individuals from 40 to 48/49 hours.
Census 1980
Work-Hours Groups Value Considered
less than 15 hours 15
from 15 to 29 hrs 22
from 30 to 39 hs 34.5
from 40 to 48 hs 44
equal or more than 49 hrs 49
Census 1970
Work-Hours Groups Value considered
less than 15 hours 15
from 15 to 39 hs 27
from 40 to 49 hs 44.5
equal or more than 50 hrs 50
18Specification 2 is based on Hungerford and Solon (1987), that captures the effect of the diploma on the years of
completion with reference to school levels (primary [S4], middle [S8] school, high school [S11] and college [S15]).19Abbreviations will be used herein for this article:
NEDUC : no education, :PRE preschool, : EFj :j-nth grade of Primary School, EMj : j-nth grade of High School,
: SUPj :j-nth grade of College, MD: Master’s/Doctorate. The grades Primary School and Middle School together
are referred to as Elementary School in Brazil.20Degree are given the following values: 0, never studied, 1 preschool or literacy, 2 first grade a primary school, 3
second rate of primary school and so on up to 17 for those who have studied at the Masters or Doctorate level.21The relevant dummies were excluded to avoid perfect linear dependency in the matrix regressions.
12
grade of primary school necessarily has four years of education, in other words, this variable does
not capture directly delays or those students who repeat grades. Meanwhile, someone who has
never attended school or someone who has only attended preschool, is given the same value of 0
years of education, so that these two groups are not differentiated in the specification of years of
education, while in using specification of the grades (years of schooling) it is possible that they can
be differentiated. We also note that six years is assumed to be the age at the start of education
and this is taken as a reference base. Another factor is that PNAD has a maximum limit of 15 or
more years of education. Thus, there could be differences for someone who attended both college
and graduate school courses. In addition, the specification for grade level takes into consideration
whether or not the individual finished his course. With regard to the Census, we have the same
structure, but the variable years of education is more divided, having as its maximum value 17 or
more years of study, with the exception of the years 1970 and 1980, where this variable does not
appear and therefore we estimated only the specification of grade level for these years.
4.3.1 Sample Selection
In Subsection 4.2 the sample to be used was specified conditioned by the explanatory variables, in
other words, filtered by race, gender, age, etc. The problems of bias in the sample selection could
appear to impact the sample of the dependent variable, if only individuals with a positive salary are
considered. The problem of sample selection emerges from the fact that we are unable to observe
the supply of hourly wages for individuals who are not working, in other words, when this wage
offered is less than the reserve wage of the individual. Thus, some individuals decide not to work,
but, as already mentioned in subsection 4.1, we assumed that their wage supply was also taken into
consideration by those who are making the decision of how much they should accumulate years of
education, because that “excluded” individuals are at an age where they are active in the labor
market. The failure to incorporate these individuals will bias educational returns.
To correct this bias, we use the two-stage estimation procedure developed by Heckman (1979,
henceforth heckit), in which we estimate in the first stage, a probit using the entire sample with
a dummy as a dependent variable if the individual is employed. This is the so-called selection
equation. Thus, we obtain an inverse Mills ratio, and estimate the wage equation using this ratio.
The “t” test for the parameters of the inverse Mills ratio is a valid test for the null hypothesis of
the nonexistence of bias in the selection22.
For the selection equation, we use in addition to the co-variables from the salary equation, the
number of children, a dummy for marriage, income not originating from work, a dummy if the
individual belongs to an union, and dummies for the states of residence23.
22The null hypothesis implying the assistance of a bias and a sample selection is rejected.23The number of children is calculated directly only for women. Thus, to calculate this variable for men, would
identify the children present in the family, where the father is the head of the family or the person of reference. For
the marriage dummy, we proceed in the same fashion, calculating the value of one for persons that are head of
the family, since for the PNAD this variable is not exist. A variable for unions was not included in the regressions
13
4.3.2 Results
Tables 3 — 6 in the appendix show the results of the test for linearity for the specifications as
defined. Under all of the estimated specifications, the null hypothesis that the coefficients in the
non-linear terms are null is rejected. In addition, for all of the specifications, note that the value of
the statistic presents a tendency that leads us to conclude that the assumption of linearity in the
Mincer model has become less and less valid , leading to poor specifications in the models that use
it. With regard to the Census, the F statistic is elevated to one of the first specifications, while in
the most recent it oscillates, at a high level.
Comparing the adjustments made in the models, note that the inclusion of the sample design
and the correction of bias in the sample selection using heckit, reduces the F test statistic, but not
to the point of rejecting the null hypothesis for linearity of education.
Therefore, we strongly reject the hypothesis of linearity for Brazil, which by itself invalidates
the consideration of the mincerian coefficient as a return on education.
4.4 Test for Parallelism
Initial estimates of income as a function of experience for various levels of education will be prepared.
To obtain these estimates, we estimate the following equation:
y = f(x) + u,
such that E[u|x] = 0 and E[u2|x] < ∞, which implies that E [y|x] = f(x). therefore, an estimate
for f(x) provides an estimate for the average of y conditional on x. For us to estimate f(x), we use
the global parametric approach which imposes a functional form on f(x)24. Thus, we can impose
f(x) = ax + bx2 + cx3, or a higher order polynomial. The disadvantage of this method is that
the larger the order of the polynomial, the greater are the problems inherent and multicolinearity,
where the estimates lose precision and parsimony. In addition, these techniques are sensitive to
“outliers”, given the fact that the estimates of each point depend on the entire sample. But one
of the larger problems with the parametric methods is the imposition of a functional form on the
model to be estimated, which could create problems of poor specification. Thus we start from a local
approach, using the local non-parametric linear regression method. The idea of using this method
is to minimize in a neighborhood around the points of our grid (x0), the sum of the quadratic
residuals, weighted by the form and width of a sequence of kernels{K(xi−x0hn
)}ni=1
(Härdle, 1990).
Thus for a random sample {xi}ni=1 i.i.d., we have::
(m(x0), b(x0)
)= argmin
m,b
n∑
i=1
[{yi −m− b(xi − x0)}
2Ki
],
with census data, because it’s not covered. And for the Census of 1970, the income from non-labor sources was not
included because only the income from the principal occupation is available.24These methods would be, for example, a global polynomial approximation and splines, with the latter already
having been used in linearity tests.
14
in which, Ki = K(xi−x0hn
)is a quartic kernel25 and hn is a bandwidth, such that hn
n−→∞−→ 0. For
the first-order conditions we obtain::
m(x0) =∑ni=1 yiWi (x0) , (6)
where, Wi (x0) =Ki
[∑ni=1 (xi − x0)
2Ki
]− (xi − x0)Ki [
∑ni=1(xi − x0)Ki]
∑ni=1Ki
[∑ni=1 (xi − x0)
2Ki
]− [∑ni=1(xi − x0)Ki]
2.
Thus, m and b are estimators for f(x0) and f′(x0) respectively
26. The null hypothesis for the
test for which of the profiles of the log experience-earnings are parallel among different years of
education is:
H0 :
{[E(yi|x10, s = s1)−E(yi|x10, s = s2)]− [E(yi|x20, s = s1)−E(yi|x20, s = s2)] = 0
[E(yi|x20, s = s1)−E(yi|x20, s = s2)]− [E(yi|x30, s = s1)−E(yi|x30, s = s2)] = 0,
where, xi, corresponds to i years of experience for i = 10, 20, 30. Thus, the idea of the test is simple:
to verify if the difference in the average salary conditional on the level of education s2 in relation
to s1 is the same for two different levels of experience27,28. According to Heckman et al. (1998),
to test the independence of the average in L different values of x, the values of xi are selected and
separated by at least two times the bandwidth (2hn), such that the estimates are independent and
thus the statistic is asymptotically distributed by χ2(L). Since we use hn = 5, we therefore select
the values of xi spaced 10 by 1029. Therefore, since mxi ,sl is the estimate of E(yi|xi, s = sl), the
statistic for the test for parallelism for the null hypothesis defended above would be, according to
Heckman et al.(1998)::
∆′Φ−1∆d−→ χ2(L), L = 3, (7)
25The quartic kernel is defined as:
K (t) =
{(15/16)(t2 − 1)2 if |t| < 1
0, otherwise
26 Intuitively we are using a local polynomial approximation, through a Taylor expansion in the order p, p = 1,
around x0. In the general case we would have:∑n
i=1
[{yi −m− a1(xi − x0)− a2(xi − x0)
2 − ...− ak(xi − x0)p}2Ki
]
where, a2 is an estimator for f ′′(x0)2 . In the case of p = 0, m it would be the well-known Nadaraya-Watson
estimator.27 In addition to this joint test, we also tested separately for only one difference, in other words for the null
hypothesis:
[E(yi|xj , s = s1)−E(yi|xj , s = s2)]− [E(yi|xl, s = s1)− E(yi|xl, s = s2)] = 0,
for l �= j, (l, j) equal to (10, 20) and (20, 30).28These values for i (10, 20, 30) are valid for comparison of levels of education of over 15 years or more (more
than the SUP), 11 years (EM3) and 8 years (EF8). But those that involve 4 (EF4) and 0 (PRE and NEDUC)
years of education, the grouping of year so experience does not cover 10 years, and therefore the values assumed are:
{20, 30, 40}.29We made estimates for the bandwidths varying from 2 to 10 and there was little change in the smoothing of the
income flow profiles. Thus, we chose, using a subjective criterion, an intermediate bandwidth not unlike to HLT.
and Φ =M ·diag (V ar (mx10 ,s2 ) , V ar (mx10,s1 ) , V ar (mx20 ,s2 ) , V ar (mx20 ,s1 ) , V ar (mx30 ,s2 ) , V ar (mx30,s1 ))·
M ′. To calculate variance we use the estimator proposed by Heckman et al. (1996a):
V ar (mxi ,sl ) =n∑
i=1
Wi (x0, sl)2 ε2i ,
where, εi is the residual of the regression.
4.4.1 Results
The graphs of the experience-income profiles were obtained using a nonparametric estimator for
various levels of education30. Taking Panel I from the Census, which shows relatively stable mea-
sures, as a reference, we see that income flows tend to be a steeper and concave function as the
level of education increases. This point is consistent with the literature (Becker, 1975; Willis, 1986;
Psacharopoulos, 1994), in which the individuals tend to not only earn more with higher levels of
education, but show greater rates of growth, which tend to decline more rapidly over the working
life cycle, for higher levels of education. An initial investigation of these graphs points against
parallelism, given that some salary profiles tend to approach one another. In the case of PNAD
(Panels 2 and 4)31 we see also an approximation for some levels, as well as on Panel 3, where a
crossover of the profiles occurs, behavior similar to that observed by HLT.
Tables 7 — 10 show the tests of the statistic (7) for the joint hypotheses, as well as for only two
pairs for different experiences. In relation to the PNAD, we note that for the majority of years,
for the two specifications, the null hypothesis for the parallelism set is rejected, for some pairs
of different years of education (schooling). It is important to point out that in the results of the
PNAD there is a large variation in the salary differential for a given level of experience, from one
year to the next. One of the reasons for this variation is the lack of a large number of observations
per cell that are necessary for nonparametric estimation methods, which occurred because of the
need to apply filters to the PNAD sample. Thus, a conditional mean estimated for these points
test oscillate more from one year to the next. This oscillation can also be noted in the graphs of
the profiles. Therefore we performed the test for the Census data as well, which, given our sample
limitations, included a larger number of observations per cell and, therefore, less oscillation in the
30The graphs shown are for only two years. The other years may be available from the author by request.31Due to the application of filters to the sample, we see that some education-experience cells have a very low
number of observations of individuals that didn’t graduate from Primary/Middle/High School or College or Mas-
ter’s/Doctorate, principally at the end of the life cycle. This is a recurring problem of these methods, and can be seen
also in the studies by Murphy and Welch (1990, 1992) and HLT, which use the CPS (Current Population Survey)
and the US Census, respectively. For this reason, the graphs from PNAD show only the salary profiles for completed
years of schooling (i.e., completed Primary/Middle/High School or College or Master’s/Doctorate), which have a
greater number of observations per cell.
16
estimates. For all the estimates, with the exception of middle school with regard to primary school
in 1991 and 2000, parallelism is rejected. Thus in the section that follows, we compute the IRRs
to measure the bias of these estimates in relation to the mincerian coefficient.
5 IRR
For the calculation of the IRR we use:
l∑
x=0
Y (x, s+ h)
(1 + r)h+x−
l∑
x=0
Y (x, s)
(1 + r)x= 0, (8)
where, Y (.) are the adjusted values of the parametric (spline and the Taylor expansion32) and
the non-parametric regressions. For the specification of the years of education, h is simply the
difference between the two levels of education. In other words, when we compare the present
value of earnings for those with 8 and 4 years of education, this “h” would be equal to 8 — 4 = 4
years. Nevertheless, when we use the specification for schools, we should take into consideration
the expected average time to finish each level of schooling or degree of education completed. Thus,
when we compare the present value of earnings for those who complete middle school (EF8) with
those who complete primary school (EF4), the h will be the expected average time to complete
middle school less primary school. But because this variable is not available, we use as a proxy
variable the average age of individuals who attend a given grade level33. And we add 0.25 to the
estimate for h, to minimize the measurement error that tends to underestimate the average age
for completing a given educational level, because the PNAD and census are taken in the middle of
August-September, in other words, at a point where one quarter of the school year remains to be
completed34. Below we present the results, after which we will discuss other hypotheses raised in
the calculation of the IRRs.32Relaxing the parallelism assumption, we estimate the IRRs using the non-parametric spacification that has always
presetned and by a parametric specification, i.e., by a 2nd degree Taylor expansion, as defined below::
lnY = α+ β1 exp+β2 exp2+β3S + β4S
2+β5 (S · x) .
This estimate was done to compare to non-parametric estimates and verify the potential discrepancy.33Thus, for example, for the year 1992, the average age for the individual who attended middle school was 16.33
years, and for individual who attended primary school was 11.67 years. The difference between these two averages
4.66, greater than 4 years, which would be the time needed to complete a four-year program without repeating. This
occurs because the proxy for the specification of the series takes into account students who repeat the school year.34 It happens that the INEP provides an estimate of the average time for completion of school, but this was
available only for 1995 through 2001. Thus, for us to have more homogeneous estimates for comparative purposes we
constructed this measure faced on the age of the individual, which differs somewhat from the average of the INEP.
17
5.1 Results
Tables 11-12 show the estimates for the IRRs. The first two lines for each year (Mincer I and II)
refer to the mincerian coefficient35 from the first two original models. They are taken in points of
reference, in order to measure the bias in relation to the estimates of the IRRs. The other lines refer
to the estimates of discount rates, first relaxing the linearity assumption (the spline function) (IRR-
nonlinear, third line) and then parallelism. For the latter, parametric models (Taylor expansions)
and (IRR-nonparallel parametric fourth line) and nonparametric (IRR-nonparallel n-param., fifth
line) were estimated. These estimates are divided in three sets of columns for the PNAD: regressions
estimated without any correction of all; including the sample design and incorporating a sample
design and correcting the bias in the sample selection using heckit. And for the Census: without
correction and estimated using heckit.
We note, both for the Census as well as for the PNADs, that the bias36 tends to be positive
for all levels of education, specifications and types of corrections excepting when it is compared to
a greater number of years of education, when the negative bias shown for the returns to greater
levels of education, becomes positive when we change the focus of the specification of the grade
level, for all types of corrections. The last specification tends to measure the returns to higher
levels of education more accurately, while those for years of education could be combining returns
to an undergraduate education, an undergraduate degree and graduate studies. For this, note
that returns a relatively higher for the highest level of education in the specification of years of
education (S15-S11 in the PNAD and the Census). In terms of magnitude, when we incorporate
only the sample design of the PNAD, the bias reaches a difference of more than 12 percentage
points (p.p.) when compared to the nonparametric IRRs and Mincer II for the EF4-PRE, for the
year 2003 (16.42% - 3.98%). When we incorporate the sample design, and correct the problem of
sample selection (heckit) we also obtained biases that reach more than 12 p.p. when compared to
the nonlinear IRRs and Mincer II for EF4-PRE for the year 2003 (15.71% - 3.53%). The bias for
the higher levels of education are smaller, but nevertheless significant, and can reach a magnitude
of almost 7 p.p., for example, for the year 1993, for this same specification for SUP-EM3. For
the Census the biases are also high and could reach a difference of more than 14 p.p. when the
mincerian return in 2000 went from 17.29% to 3.03% for the MD-SUP group (non-linear IRR) in
the model using heckit.
It is worth noting that there is little difference between the non-linear IRRs (third line) and the
non-parametric (last line). , when the sample design is incorporated, this bias reaches a maximum
of 2.08 p.p. and the comparisons of S4-S0 (9.18% - 7.11%) for 2001, and 1.08 p.p. (7.07% - 5.98%)
when comparing EF8-EF4 de 2003.
35The mincerian coefficient was adjusted for continuous time, as: eβs − 1 = mincerian return.36The bias to be discussed in this subsection is always based in relation to the model in Mincer II, the most widely
used in the literature, unless another IRR is mentioned as a reference base. Thus, it is understood as a positive bias
of the difference between the mincerian return and a given IRR, which is positive.
18
With regard to the Census, the bias when comparing that to the highest levels of education
(S17+-S15) in the year 2000 is almost 2.4 p.p. and for different levels of schooling, with the exception
of 1970, the bias does not reach 1.5 p.p., in absolute terms. This leads us to believe that, despite
our rejection of parallelism, the spline function is a good approximation when estimating IRRs.
Nevertheless, when the last line is compared with the fourth line (IRR non-parallel parametric)
a large bias is observed. Thus, when parallelism is relaxed, it is best to opt for a nonparametric
approach where the biases occasioned by the poor specification of the parametric model are not
incurred.
We will also point out to IRRs 42 levels of education that appear to be relatively low. The IRR
for the MD-SUP (Census) is low, which goes against the common sense reasoning that graduate
coursework elevates the returns by a substantial amount. But what makes these returns low is the
average length of time it takes to complete the course, varying between 5 and up to 10 years. An-
other IRR is one that compares preschool education with zero levels of instruction (PRE-NEDUC,
PNAD). This suggests that the individual who never went to school or only attended preschool
shows an income differential that is insignificant37. It should be pointed out that this should not
be taken as a parameter for public policy, given the vast evidence that investment in preschool
education increases skills, the time that students spent in school and reduces repetition in years,
and consequently increases the productivity of the individuals in the labor market (Heckman and
Carneiro, 2003). Barbosa and Pessoa (2006) estimate IRRs for preschool education, developing an
interesting methodology, which indicates that those with preschool education increase the proba-
bility of remaining in school and the income stream of those who do. Thus, they obtain rates of
the magnitude of 17% that have remained stable over the last 10 years.
In a horizontal analysis of the table, we can infer gains from incorporating the sample design
and using heckit. With regard to the specification of the years of education, note that for the
nonlinear IRRs it is possible to have a negative bias of almost -1.68 p.p. (12.58% - 14.26%) when
comparing S4 and S0 in 1993, and a positive bias of 1.84 p.p. (13.73% - 11.9% for S11-S8 in 1996
if the sample design is not incorporated. In general, the IRR tends to be underestimated for lower
levels of education and overestimated for higher levels of education. With regard to the use of the
sample selection to correct the bias, the gains from estimating using heckit occur at all levels of
education, reaching a positive bias almost 2.56 p.p. (8.78% - 6.21% in 2002) when comparing S4-S0
for PNAD and almost 14.4 p.p. (26.26% - 12.82%, in 2000) for the Census comparing S17+-S15.
In addition, in the majority of cases, the bias is positive, which implies that the returns are lower
yet when correctly estimated. Plus it is highly recommended when estimating a model, to include
37 In general this return is quite low — and can even be negative — which is to be expected because the market does
not tend to pay much more for someone who has attended a preschool than for someone who has no education at
all. For a part of the decision by individuals to complete only the pre-school, one possible explanation could be due
to the quality of the parents (because it is parents who invest in the education of their children when little), which
could have as a proxy the educational level of the parents, which would be low for this group. This is an interesting
issue that could serve as the basis for future research.
19
the sample design of the PNAD and correct for the bias in the sample selection for the PNAD and
the Census.
Thus, these factors point clearly to the conclusion that mincerian returns are biased, differing
widely from the true returns to education measured by the IRR. Therefore, this bias has particular
consequences since the returns are overestimated and therefore do not adequately explain the
changes in demand for education. For Brazil, returns are much less when correctly measured. This
occurs principally due to the specification for the years of schooling (primary, middle, high school
and college) that not been considered in the literature, and, as can be seen on table 11, provide
returns that are much less than the years of education approach, because it considers the average
time expected for conclusion, in other words the term h is greater for the first approach than for
the latter.
We would further point out that the IRRs for the years of education approach are close to
those obtained by Barbosa and Pessoa (2006), with the exception of high school and college were
differences or of a greater magnitude. In addition, they show some similarities with studies based
on the Mincer model such as Blom et al. (2001), because these authors obtained returns for years
of schooling that were much lower in relation to middle school and high school. Logically they are
biased by the evidence shown in our study. Our returns (IRRs) are higher for primary and middle
school, and lower for a high school and college. This same result occurs in the studies Fernandes
and Menezes (2000), Leal and Werlang (1991) and in Ueda and Hoffman (2002).
This difference could be the result of the use of different methods and samples among the
articles. Given the very large range of estimated returns, the question emerges of what is the IRR
for each level of schooling? With regard to the specifications, the specification for schooling is more
and accurate because it incorporates the average time spent, the importance of which has already
been pointed out. However, for comparison with the work of others, we present also a specification
for years of education. With regard to the corrections, we take as a reference, the models with the
largest number of corrections, for example: (i) nonparametric and (ii) the parametric heckit. The
former with no problems and functional form and the latter corrected for the bias of the sample
selection. Thus, we showed two models with corrections in different dimensions, and, therefore,
measuring the difference between them is not a trivial matter38. Nevertheless, we stress that for
both models we noticed that these estimates for reference are not particularly sensitive to the choice
of models. Therefore, we take as references the nonparametric IRR and the parametric (Heckit)
IRR for an analysis of the changes over time. Various studies, as we have already indicated (Blom
et al, 2001), show a decline in mincerian returns for Brazil, with the exception of the college level,
vis a vis an increase in the enrollment at all educational levels. In Panel 5 we see more easily the
38We would need a model that corrects simultaneously the problem in two dimensions and incorporates the sample
design for the survey. The article by Das, Newey and Vella (2003), as mentioned in Note 5, suggests a nonparametric
heckit, but that does not permit the incorporation of the sample design. In other words, this is still an open question in
the literature. Later developments along this line of research may permit an analysis of the gains from the correction
of the problems cited in the future.
20
changes over time.
With regard to the Census (graphs 5.3 and 5.4), the levels of education EF4-NEDUC, EM3-EF8
and MD-SUP increase or remain stable in 1991 when compared to 1970 and/or 1980 and decline
in the last decade. EF8-EF4 declined over the decades and SUP-EM3 increased during the last
decade. With regard to the PNAD data (graphs 5.1 and 5.2), that looks at recent changes, we note
for both the references a behavior that is close to the Census over the last decade.
5.2 Discussion
Some points should be made with regard to the hypotheses and the estimates considered in the
calculation of the IRR. One assumption made by Mincer is that individuals first get an education
and later enter the labor force. Two points should be raised: (i) according to table 1, few individuals
work while they are going to school, but this percentage has increased over time, which has increased
the quality of night school courses in Brazil and; (ii) Mincer assumes that the direct costs of
education are compensated for while studying, or that these are negligible. With regard to the
latter element of the cost of education, Becker points out that the investments in education are
concentrated in the early ages because: (i) over time the individual has a smaller period of time
to recover his investment in human capital and (ii) the opportunity cost increases as the level of
human capital rises. Thus, the cost of time is an important source of the total cost calculation of
the IRR. Becker assumes that in the literature this cost is often overlooked and that it should be
treated in the same way as direct costs. Schultz extends the costs beyond monthly expenditures,
annuities and others ones, where the salary forgone makes up in a significant part of the total cost.
In addition, Schultz raises the question of leaving out the salaries of student workers, for whom
the estimates for opportunity cost tend to be overestimated. Thus, given the evidence noted above
of the increase in student workers, we analyze an additional specification that includes those who
attend school based on additional nonparametric specifications including those who attend school
based on nonparametric and nonlinear specifications. We also incorporate part-time workers (more
than 20 hours per week) because many times this group were only half time in order to be able to
study. In the appendix, the line IRR — additional 1 is compared with the nonparametric estimate,
always incorporating the sample plan or sample weight (tables 13 and 15), and also based on the
nonlinear estimates with the sample plan and the correction of the bias of the sample selection
(tables 14 and 16). There is not a large difference between the rates for all the specifications. This
is because the percentage of working students is still small although growing.
Another hypothesis is l′(s) = 1, assuming that the length of time on the job might not depend
on years of education, and could vary between individuals with the same level of education. Thus,
we relax this hypothesis as well. It should be noted that the length of time working adopted here
is 32 years, because of the limits on the age of the individual39, there being no difference between
39 In other words, since x = i−s−6, then ∆sx = 56−24 = 32 years, for a fixed level s of education. As we consider
the minimum age to be 24 years, therefore, for this age there are already people with degrees from the highest level
21
this hypothesis and one where l′(s) = 0. Thus, on line additional 2, which maintain the hypothesis
that l′(s) = 1, but where the age range includes individuals from 10-65 years of age and with a
working life of the first 40 years: and on line additional 3, we allow individuals to work up to 65
years of age, when they retire (l′(s) = 0)40. We note also a large difference when we use a wider age
range, which shows us that the cost borne at an early age are significant; and their exclusion tends
to bias downward the returns for higher levels of education and biased upward the IRRs for S4-S0
and PRE-NEDUC/EF4-PRE for PNAD. For the Census the same logic applies. Now, comparing
additional 2 with additional 3, we note that the differences in the estimates are small. This is due
to the fact that the earnings at the end of the life cycle are more heavily discounted, therefore
having little effect on the present value of earnings and therefore little impact on the IRR (HLT).
Finally, we include private direct costs (tuition) in the calculation of the IRR41. These costs
were obtained through the Family Budget Survey (POF) for 1995/199642, from which we estimate
an average level of education and adjusted for a standard 40 hours workweek, for the purposes
of comparison of standard income for the individual’s hours of work. Since the private expenses
on education are available only from the POF for this year, we performed a simulation of these
expenses for other years for the purposes of comparison. We estimated total average income for
families that had members in preschool, then we did the same for families that had members in
primary school and so on in succession. And we calculated the percentage of expenditures on
monthly tuition costs in relation to average income, obtained from the POF, and assume this to be
constant for all years. Based on this percentage, we estimated spending using the average income
from PNAD in the same fashion. Table 17 shows these estimates. Thus, according to tables 13 and
14, for the year 1996, based on the original expenses from the POF, (line additional 4 IRR (POF))
we see much lower rates, reaching a drop of more than 3 p.p. (S15-S11) in relation to specification
three. In the same fashion, for the simulated expenditures, we observed relatively large declines for
all years, principally for the higher educational levels.
We should stress that the additional 2-4 IRRs (tables 13-15) are more sensitive between the non-
parametric and parametric models, principally in relation to the returns to middle school education
of education.40We want to stress that the additional 2 and 3 IRRs include a modification to additional 1 IRR, and that additional
4 IRR, that will be presented below includes the changes to additional 3 IRR. In other words, we are relaxing these
assumptions gradually and successively. The note to table 13 reinforces this point.41More precisely, we estimate the IRR using the:
l∑
i=0
Y (i, s+ h)
(1 + r)h+i−
l∑
i=0
Y (i, s)
(1 + r)i−
h∑
i=0
c
(1 + r)i= 0,
where, c are the average direct costs for monthly tuition.42 It should be pointed out that INEP provides data on public spending on education on its webpage on the
Internet. These expenditures could be included in the analysis of the IRRs end a macroeconomic context, which
compares investment in education with other investments, such as capital, as has already been done by Langoni
(1974).
22
compared to primary education (EF8-EF4 e S8-S4, PNAD)43.
Analyzing the data over the years of the Census, we note on graph 6.4, Panel 6, a downward
trend for all levels during the most recent decade and an increase in undergraduate and graduate
education over immediately preceding high school during the last decade. Estimated using heckit,
(graph 6.3), we know a slight difference with respect to EF4-NEDUC, which increased and MD-
SUP which declined in the last decade. With regard to the PNAD, we note that the nonlinear IRR
(graph 6.1) shows similar behavior with regard to the previous section. The nonparametric IRR
(graph 6.2) also shows the same behavior, with the exception of EF8-EF4, which shows a decline up
to 1999 and an increase beginning in that year, and the high school level that remained relatively
stable during this decade.
6 Conclusions
Since the publication of the seminal work by Mincer in 1958, later extended in a 1974 version, various
empirical articles have used the mincerian regression to estimate the “rate of return” for education.
However, some of the assumptions of line behind the original model, to allow the coefficient for
years of education to be understood as a rate of return, have been rejected in this article (linearity
and parallelism). Thus for Brazil, we corroborate the evidence presented in international studies
for the USA. Therefore, we relax various assumptions in order to measure the bias that stems from
the poor estimation of the rate of return. We note that the bias tends to be positive for all levels
of education. Thus, the IRRs tend to be smaller when the assumptions are relaxed; among these,
linearity, parallelism and the inclusion of private cost tend to generate the largest impact on the
estimation of the IRRs. Another significant change in the IRRs was the expansion of the age group
from 24-56 years to 10-65 years. This shows that the costs incurred an early age are significant.
In relation to the specifications of years of education and years of schooling, the latter series adds
an important additional aspect to the estimation of IRRs in relation to the former: individuals
considered the average time to finish levels of schooling. This carries with it a significant reduction
in the IRRs, principally for higher levels of education.
The incorporation of the sample design from the PNAD is an additional benefit for the empirical
literature in Brazil, one that had not been considered in the estimation of various economic models.
This correction not only confirms the tests that had been done, but also has a considerable influence
on the correct measurement of the returns. The correction for the sample selection bias also can be
considered an additional gain, given that even recent studies, such as HLT, do not estimate using
the two-stage estimation procedure developed by Heckman (1979), which alters the magnitude of
the IRRs as well.
Finally, the majority of the IRRs estimated tend to corroborate the evidence in the literature
43A return to middle school diverges from other studies, as mentioned in the previous section. Taking as a reference,
the additional 2-4 IRRs, we note that this return is greater for the nonparametric in comparison with the nonlinear.
23
that returns to education have been declining in recent years, with the exception of higher-level
education (University and postgraduate education) that indicates a growth in this last decade, but
at a lower level of magnitude than those obtained in various recent studies. The correct estimation
of rates of return makes it possible for future research to prepare a detailed analysis of the reasons
for the increase in the IRR for undergraduate and graduate education, given the evidence of a
substantial increase in the rates of enrollment during the recent decade. And it is a key indicator
to serve as a guide to public policy and in the evaluation of social programs.
References
[1] Barbosa Filho, F. H. e Samuel Pessoa (2006). Retornos da educação no Brasil. mimeo.
[2] Becker, G. S. (1975). Human Capital: A Theoretical and Empirical Analysis, with Special
Reference to Education. New York: Columbia Uinversity Press, 2a edição.
[3] Belman, D. e J. Heywood (1991). Sheepskin effects in the return to education: An Examination
of Women and Minorities. Review of Economics and Statistics : 73(4): 720-724.
[4] Binelli, C., C. Meghir e N. Menezes Filho (2006). Education and Wages in Brazil. mimeo.
Apresentado no XXVIII Econtro Brasileiro de Econometria.
[5] Blom, A., L. Holm-Nielsen e D. Verner (2001). Education, Earnings, and Inequality in Brazil,
1982-98. Peabody Journal of Education, 76(3&4): 180-221.
[6] Cameron, S. V. e J. J. Heckman (2001). The dynamics of educational attainment for black,
hispanic and white males. Journal of Political Economy, 109(3): 455-99.
[7] Card, D. (1999). The causal effect of education on earnings. In: Ashenfelter, O. e D. Card
These hypotheses are exemplified for the first panel above. These apply to the others panels, altering the values of x and s.The greyish areas refer to p-values of joint hypothesis that are greater than 0.05.
This note apply too to Tables 8-10.
28
Table 8. Differences in the conditional mean of the log of earnings in two groups of years of
education by level of experience and P-Values of the tests for parallelism for three null hypotheses