220 F

Forecasting the Term Structure using Nelson-Siegel

Factors and Combined Forecasts

Econ 220F, Prof. Gordon Dahl, Spring 2007

Michael D. Bauer

June 12, 2007

Abstract

This paper attempts to replicate the good performance of the DL-approach to

term structure forecasting, documented in Diebold and Li (2006), in a newer dataset.

It finds that in the original specification using AR(1) processes to forecast the Nelson-

Siegel factors, this approach does not perform well. A better alternative is to model

the factors as martingales, and therefore simply predicting future yields by today’s

fitted yields. Furthermore, the persistence of the individual yields and their pricing

errors allows to reduce the forecast error variance, by shrinking the DL-forecasts

towards the current yields. Forecast combinations that incorporate DL-forecasts

and other yield forecasts perform best among the considered competitors.

1 Introduction

Accurate forecasts of the term structure of interest rates are crucial in bond portfolio

management, derivative pricing, and risk management. Affine term structure models,

which are heavily used in practice to price options and other derivatives on fixed income

instruments, perform poorly in out-of-sample forecasting (Duffee, 2002). An important

recent contribution in this field is Diebold and Li (2006), who adapt the Nelson-Siegel

(1987) framework to the purpose of forecasting the entire term structure. They exploit

the parsimonity of this framework by forecasting just three latent factors (the Nelson-

Siegel factors, henceforth NS-factors), which are modeled as univariate AR(1) processes

and interpreted as level, slope and curvature of the yield curve. From the forecasts of these

factors, the entire term structure at future points in time can be generated. This approach

is now commonly known as the Diebold-Li (DL) approach to forecasting the yield curve,

and it outperforms all considered competitors over a forecast horizon of 12 months in US

data in the forecast exercise of these authors.

In the present paper, we attempt to replicate these findings, using first data over an

identical sample window, and then a bigger sample including data up until December

2006. We extract the factors and analyze their dynamic properties. Then we compare

fitted and empirical yield curves, and find that pricing errors are persistent. The main

task is the forecasting exercise: We compare the original DL-approach and some variants to

several competitor models. In particular, we vary the specification of the factor dynamics,

considering Random Walks as well as AR(1) processes. A decomposition of the forecast

error variance into the contribution of the factor prediction errors and the pricing errors

provides insight into how different DL-specifications compare relative to each other, and

how one could improve upon the DL-method. Furthermore we include combined forecasts

in the analysis, using both equal weights and performance weights, that each combine the

forecasts of all competing individual forecast models.

Our results are disappointing if one believes in the superior performance of the DL-

approach in its original specification: While for the sample including the same time periods

as DL, the DL-approach does outperform some competitors at selected maturities and

forecast horizons, the outperformance is less pronounced than in the original paper. Since

summary statistics and yield curves on selected dates agree quite closely between us and

DL, it is surprising to find such different results. Using the full sample, the performance of

the original DL-approach is even worse. Specifying AR(1) processes to forecast the factors,

one can in no case beat the random walk significantly in terms of predictive accuracy. We

find that no-change forecasts for the NS-factors generally improves the forecast. The

1

evidence suggests that one should not try to forecast the factors, since the additional

estimation uncertainty contaminates the forecasts. Today’s fitted yield is usually the best

DL-forecast.

Another conclusion is one that is made frequently: The methods that perform by

far the best are the combined forecasts. The diversification gains from including several

competing forecasts into a combined forecast are considerable, and make this approach to

yield curve forecasting our method of choice.

The paper proceeds as follows: Section 2 explains the zero curve and its construction

and presents summary statistics for the zero curves for the US that we bootstrapped from

the CRSP bond price data. In section 3 we analyze the NS-factors as they are extracted

from the US data, compare them to empirical factors (level, slope and curvature as they

are usually calculated), and assess the fit that the NS-factors provide to empirical yield

curves. In section 4 we describe and discuss the DL-approach to forecasting, present its

competitors and the method to compare predictive accuracy. Then the results of the

forecast exercise are presented and discussed. Section 5 concludes.

2 The Zero Curve

With the CRSP Monthly Treasury Database, we have an excellent data source at hand for

constructing the US term structure of interest rates. It is very well documented, updated

regularly, and checked for consistency. We defer the details of the data processing to

appendix A. In the following subsection we explain why it is necessary to bootstrap spot

rates from the data and outline how we performed this task. Then we present summary

statistics for the US yield curve.

2.1 Bootstrapping the Zero Curve from Bond Prices

The yields we are interested in are the spot rates: The net return that is obtained on

a τ -period investment. The cross-section (across different maturities) of spot rates at a

particular point in time is called the term structure of interest rate, or yield/spot rate/zero

curve. For τ -period discount bonds, the yield to maturity (YTM) is equal to the spot rate

for that period, yt(τ). For a coupon bond, the YTM is the discount rate that makes

the present value of future coupon and principal payments equal to the cash price of the

issue. This yield is not equal to a spot rate, since coupon payments cannot generally be

reinvested at the same rate. Therefore the YTM across different maturities is not equal to

the zero curve. This latter has to be constructed from observed bond prices, and we do so

2

using the methodology of Fama and Bliss (1987): Simply put, “the discount rate function

is extended each step by computing the forward rate necessary to price successively longer

maturity bonds given the discount rate function fitted to the previously included issues”

(Bliss, 1996, p.10). Since there is a one-to-one mapping between zero curve and discount

rate function, this achieves the goal of constructing the zero curve. We provide details

about the procedure in the appendix. After we obtaining spot rates for all maturities at

a certain point in time, we linearly interpolate these rates into 17 fixed maturities, as DL

did.

2.2 Summary Statistics

Figure 1 provides a three-dimensional plot of the US yield curve over the entire time

horizon. The variation in the level of the yield curve is much stronger than the variation

in slope and curvature, yet the latter are obviously important as well.

In table 1 we show the summary statistics for the monthly term structure in the US.

We see some of the usual patterns:

• The average yield curve is upward sloping.

• The short end of the yield curve is more volatile than the long end.

• Yields of all maturities are highly persistent.

Like DL we find short rates in our sample to be more persistent than long rates. In sum,

our table is qualitatively and quantitatively very similar to table 1 in DL.

3 Fitting the Yield Curve using Nelson-Siegel Factors

3.1 Extracting the Factors

In extracting the NS-factors from the yield data, we follow exactly the approach of Diebold

and Li (2006): For each month of data, we regress the yields of 17 maturities on the factor

loadings. The factor loadings are

loadings(τ) =

(1,

1− exp(−λτ)

λτ,1− exp(−λτ)

λτ− exp(−λτ)

)′, (1)

3

where λ is fixed at the value 0.0609 as in DL1 Now the regressions

yt(τ) = β′tloadings(τ) + εt(τ), for τ = τ1 . . . τ17, (2)

where βt are the regression coefficients, will give us three time series for the three Nelson-

Siegel factors, {β1t, β2t, β3t}Tt=1.

2

3.2 Model-based Factors and Empirical Factors

We will now compare the NS-factors, to their empirical counterparts as they are usually

calculated: The level is taken to be the yield at the longest maturity (10 years), the slope

is the difference between the 10-year and 3-month yields, and the curvature is twice the

2-year yield minus the sum of 10-year and 3-month yields.

As detailed in DL, the NS-factors can be interpreted as level, slope and curvature

of the yield curve. The first factor, if increased, raises yields at all maturities, since its

loading is constant, and can therefore be interpreted as the level of the yield curve. The

second factor, if increased, increases short yields more strongly than long yields, since short

yields load more heavily on it. It can therefore be interpreted as the negative of the usually

employed measure of the yield curve slope (long minus short yield). The third factor, if

increased, will increase medium-term yields the most, since long and short yields do not

load strongly on this factor. Because 2yt(24)− yt(3)− yt(120) = 0.00053β2t + 0.37β3t, we

multiply this third factor by 0.3 and it will then closely correspond to the usual curvature

measure.

In figure 2 we see that in fact the empirical counterparts correspond very closely to

the estimated factors.

3.3 Dynamic Properties of Nelson-Siegel Factors

In table 2 we present summary statistics for the model-based factors. As their empirical

counterparts, they are very persistent. In the last column we present the results for the

Augmented Dickey-Fuller test, in order to test for unit root. The MacKinnon critical value

for rejecting the null hypothesis of a unit root is -2.57 for a sample size of 250 (our sample

1For this value the loading on the curvature factor (the third factor) is maximized at 30 months.Usually maturities of two or three years are used to calculate curvature (by subtracting twice from thisyield the sum of short and long yields), and 30 months is right in between.

2We differ in our notation from DL, in that we denote by β and not by β the actually observedNS-factors. This will be useful because the concept of some “true” unobservable NS-factors will not beneeded, and we can distinguish better between actual and forecasted factors.

4

size is 264), wherefore we cannot reject the null for the level and slope factors. They may

well contain unit roots, whereas the curvature factor probably does not.

At this point the autocorrelation and partial autocorrelation functions of the NS-

factors should be considered, and we show these in figure 3. The level factor probably

has long-memory, and might be non-stationary. The persistence is also high for both

slope and curvature, yet it is decidedly smaller than the persistence of the level factor.

Linear models that could explain correlation structures other than the Random Walk

(RW) model, are for example AR, ARMA, ARIMA, or ARFIMA models. We fit AR(1)

models to the factors and present the autocorrelation functions of the residuals in figure

4, together with Bartlett confidence bands. Not all serial correlation is captured, since

some autocorrelations at low lags are significant, and the Ljung-Box test confirms this

(results not included). We postpone further discussion of the NS-factors until we discuss

the forecast approach based on these.

3.4 Actual and Fitted Yield Curves

In figure 5 we plot the fitted yield curve using the average of the factors, together with the

average empirical yield curve. We see that the fitted curve provides a very good fit. This

is what one would expect: The average yield curve is very smooth, and so three factors

should be sufficient to capture its shape and generate a good fit.

In figure 6 we show the actual and the fitted yield curve on selected dates, choosing

the same dates as DL in their figure 5. Our plots are essentially identical to those of DL.

The important point to notice is that at some dates the fit of the NS-fitted yield curve is

much better than at other dates, with the difficulties arising when the actual yield curve is

very unsmooth and dispersed. Since yield curves are seldom looking like in August 1998,

where the fit is particularly bad, the fit is usually satisfactory.

To obtain a better understanding of how well the Nelson-Siegel fitted yield curves fit

the actual yield curves, we provide a three-dimensional plot of the residuals in figure 7.

Overall the fit seems to be sufficiently good: The residuals are usually within -0.2 and 0.2.

No long, persistent deviations from zero are apparent.

But the dynamic properties of these pricing errors should be looked at more closely.

We present summary statistics for these in table 3. Of course the means are close to zero

and never significantly different from it. The RMSE’s indicate that at the shortest and

longest maturity the fit is worst. The most important insight from this table is that pricing

errors are in fact persistent, with first order autocorrelations of the residuals ranging from

0.23 to 0.88, usually being larger than 0.5. This persistence implies that we can possibly

improve upon the DL-forecast, which predicts using fitted yields. This issue, which was

5

not considered in the original DL paper, will be discussed further in the following section.

4 Forecasting the Yield Curve

Applying simple forecasting techniques to the three NS-factors, and then using the fitted

yield curve as the forecast is the basis of the DL approach. In the following we go into

more details on this approach and possible variations and extensions, and then discuss

competing forecast models, forecast combination, and the method of choice to compare

predictive accuracy (Diebold-Mariano). Finally we present the results for the US.

Throughout we use the following notation: The number of observations in the sample

is T , with R observations being in the initial estimation window. The forecast horizon

is h (months) and the first forecast is therefore for R + h. In the out-of-sample window

we have P observations (R + P = T ). For emphasis we sometimes denote the time the

forecast is made as tf . So tf = R, . . . , T − h.

4.1 Diebold-Li Forecast Approach

The approach to forecast the yield curve chosen by Diebold-Li (DL) consists of forecasting

the NS-factors, and then using the fitted future yield as the predicted value. The time t

forecast of the zero spot rate at t + h (maturity τ) is given by

yt+h/t(τ) = l(τ)′βt+h/t, (3)

where l(τ) is the column vector with the three NS factor loadings for maturity τ , and the

3× 1 vector βt+h/t contains the forecasts for the factors. Univariate AR(1) processes and

first order VARs were used as models for the factors in DL’s forecasting exercise.

When we forecast at time tf a factor (i) using AR(1) regression, the forecast is βit+h|t =

b0 + b1βit , with b0 and b1 being the OLS coefficients from regressing {βt}tf

h on {βt}tf−h1 .

This specification will be called DL-AR.

One interesting question is whether the AR(1) specification can at all improve upon

a naive no-change forecast. Modeling the all three factors as martingales leads to an

extremely simple forecast: Today’s fitted yield, yt+h/t(τ) = l(τ)′ ·βt. We include this spec-

ification (DL-RW) in our forecasting exercise in order to assess its performance compared

to the original DL specification.

The Dickey-Fuller tests indicated that level and slope factors might have unit roots,

so an obvious further model choice is to model these as random walks, and choose AR(1)

for the curvature (DL-RW2AR1). The ACF on the other hand gives the feeling that a

6

unit root seems decidedly more likely in the level factor than in the other two, and so we

also include a specification where we choose RW for the level, and AR(1) for slope and

curvature (DL-RW1AR2). There are eight combinations when choosing betwen AR(1)

and RW for each factor – we content ourselves by just looking at four of these. 3

The accuracy of these forecasts is determined by how well the future factors are pre-

dicted, and how close the actual future yield is to its fitted yield. The forecast error can

be decomposed as follows:

yt(τ)− yt+h|t(τ) = β′t+hl(τ) + pet+h(τ)− β′t+h|tl(τ)

= (β′t+h − β′t+h/t)l(τ) + pet+h(τ)

= fpe′t+hl(τ) + pet+h(τ)

The factor prediction error (fpe) is a 3 × 1 random vector with the errors we make

in predicting the factors. These are weighed differently, depending on the three loadings

l for maturity τ . The pricing error (pe) is the deviation of the actual future yield from

fitted yield curve. Assuming that time t pricing error and factor prediction error are

uncorrelated (an assumption that seems plausible and could be tested) the expected loss

of the forecast is

E(e2t ) = Var(fpe′tl(τ)) + Var(pet(τ))

= l(τ)′Σl(τ) + Var(yt − β′tl(τ)) (4)

= V f + V pe (5)

where Σ = E(fpet · fpe′t) is the unconditional contemporaneous covariance matrix of the

three factors prediction errors. We denote by Vf the contribution of the factor prediction

errors, and by V pe the contribution of the pricing errors to the expected loss. The diagonal

elements of Σ are the forecast error variances for the chosen factor prediction method, the

off-diagonal elements the covariances of the factor prediction errors. How important these

elements are depends on how heavily a yield of the chosen maturity loads on each factor.

Intuitively, when predicting short yields, the errors in predicting level and slope are both

3Since these model choices depend on the observed persistence in our sample, one could argue thatthe researcher has information about the factor dynamics only until the time of the forecast. Yet the factthat we forecast and estimate persistence partly in the same data would only be problematic (bias ourresults) if the persistence features changed over time and it actually pays off to the forecaster to haveup-to-date information on these. We are not worried about this bias, because qualitatively similar resultson Dickey-Fuller tests and ACF functions obtain in the sample without the forecast window. Also thisis a minor point because the comparison between AR(1) and RW is in our case not motivated by theobserved data.

7

important, but when predicting a long yield V f depends only on the forecast error variance

of level factor and not of level and curvature.

If one is willing to assume independence between factor prediction errors and pricing

errors, the performance of any DL-specification depends entirely on how well it predicts

the factors. Of course it is possible that a bad forecast for the factor produces a fitted

yield that is closer to the future realization than the fitted yield from the actual future

factors, that is |yt+h(τ) − β′t+h|tl(τ)| < |yt+h(τ) − β′t+hl(τ)| = |pet+h|. This is the case if

fpe′t+hl(τ) is of opposite sign than pet+h. If this happens systematically, that is if there is

negative correlation between these two, the expected loss will be decreased, and might be

smaller than if we predict the factors well but have no correlation. Under the assumption

of no correlation however, a DL-forecast will outperform a competitor of its kind if and

only if it can predict the factors better. Again, against other methods, the performance

of DL-forecasts depends on both how well NS-fitted yield curves fit actual yields (in the

future) and how well its factors can be predicted.

The pricing error variance is one common source of loss for all DL specifications. We

may hope to take advantage of the persistence in the pricing errors for better predictions,

yet for this we have to deviate in our forecast from a fitted yield. Instead of explicitly

modeling and forecasting the pricing error, a much simpler approach can do the same job:

Namely to just shrink towards the actual yield at the time the forecast is made, e.g. simply

taking a (possibly weighted) average of the DL-forecast for the yield and its current value.

The reason this reduces pricing errors is intuitive: A yield that is off quite a bit from the

fitted curve will probably still be in the future, with a pricing error of the same sign (at

least over shorter horizons). Since the yield itself is persistent, shrinking the fitted yield

forecast towards last period’s value should reduce the pricing error. So this shrinking

method exploits the persistence in the pricing errors, and it does not need to estimate any

additional parameters. Following this reasoning, a model within the DL class would then

usually be beaten by a shrinkage towards the current actual yield. We denote the forecast

based on a simple average of DL-RW and the current yield as DL-RWS (S for shrinkage),

and assess our hypothesis in the last table.

4.2 Competing Forecast Models

In the following we present the different models that compete with the DL-forecasts. We

denote by b0 and b1 in each case the OLS coefficients estimated from the regressions in

the data available at the time of the forecast.

Random Walk The first and most important competitor is the simple no change yield

8

forecast, yt+h|t(τ) = yt(τ).

AR(1) on yield levels Yields at each maturity are predicted using an AR(1) that is

estimated on the available data for that maturity {yt(τ)}tf1 . The yields are regressed

on h-period lagged values, to produce optimal MSE linear forecasts: yt+h|t(τ) =

b0 + b1yt(τ).

AR(1) on yield changes Here we regress h period yield changes on their past values

in order to obtain a prediction for future yield changes: yt+h|t(τ) − yt(τ) = b0 +

b1(yt(τ)− yt−h(τ))

Slope regression The forecasted yield change is the predicted value from regressing his-

torical (h period) yield changes on yield curve slopes: yt+h|t(τ)−yt(τ) = b0+b1(yt(τ)−yt(3).

Fama-Bliss forward rate regression The forecasted yield change is the predicted value

from regressing historical yield changes on forward premia: yt+h|t(τ) − yt(τ) =

b0 + b1(fht (τ)− yt(τ)), where fh

t (τ) is the forward rate for investments from t + h to

t + h + τ , observed at time t.

Regression on three AR(1) principal components Extracting three principal com-

ponents from the 17 yields (of course using data only until tf ), we forecast these

using AR(1) models, and generate the yield forecasts from the predicted principal

components (see DL, p.359).

4.3 Forecast Combination

It has been known for quite some time that “combining multiple forecasts leads to increased

forecast accuracy” (Clemen, 1989), as opposed to just using a single ex-ante optimal

forecast. 4 We will evaluate the performance of two combination strategies, equal weights

and performance weights.

Given N individual forecasts for a yield (at maturity τ , which we omit in the following

notation) the linearly combined forecast is given by

yct+h|t = w′

tyt+h|t =N∑

i=1

wt,iyit+h|t,

where the N×1 vector of weights wt can be time varying. For equal weights, each element

of this vector is equal to 1/N . For performance weights, each forecast is weighed by the

4For a detailed analysis of the subject see Timmermann (2005).

9

inverse of its MSE over the last 24 months (or over as many months as are available):

wtf ,i =1/MSEtf ,i∑N

j=1 1/MSEtf ,i

MSEtf ,i =1

v + 1

tf∑

t=tf−v

e2t,i

v = min(tf −R, 24),

where et,i is the forecast error of forecast model i at time t. As before tf is the time

at which the forecast is made. Naturally we cannot use data on the performance of the

individual forecasts after tf , e.g. the MSPEs over the whole out-of-sample period. The

choice of using the last 24 periods is of course arbitrary, and is based on a trade-off between

precise estimation of the performance (which necessitates many periods), and the fact that

performance might change over time (asking for less periods).

4.4 Comparing Predictive Power

For each maturity and each forecast horizon, a forecast model produces errors e =

(eR+h, . . . , eT ), with the individual error observation being et = yt − yt/t−h. There are

ne = T − R − h + 1 = P − h + 1 error observations. As usual we assess the forecast

accuracy in terms of squared error loss by estimating the expected loss E(e2t ) with its

empirical counterpart, the mean squared prediction error (MSE) e′e/ne.

It should be noted that this estimator is biased downward for models that involve pa-

rameter estimation (Efron, 1983), since it does not account for this estimation uncertainty.

In our case, the uncertainty in the parameter estimation is obviously different across the

models. The RW model for yields needs no estimation at all. It seems intuitive that in a

performance comparison using MSEs a bias could arise in favor of models that rely heavily

on parameter estimation, because MSE does not account for this estimation uncertainty.

But we will see that the RW performs remarkably well, and its relative performance would

only be further improved by employing more accurate strategies to estimate the expected

loss (e.g. the bootstrap).

To rigorously test whether models forecast with the same accuracy or one provides ev-

idence to outperform, we use the test statistic advocated by Diebold and Mariano (1995),

henceforth DM. We estimate the long-run variance of the loss-differential series by estimat-

ing the spectral density at frequency zero with the well-known Bartlett Kernel. The lag

length for the included autocovariances is chosen to be equal to the forecast horizon. This

test statistic converges in law to a standard normal distribution, and so we can compare

10

it to the usual values of ±1.65 (10% sig. level) and ±1.96 (5% sig. level).

We would like to point out that this test has asymptotically correct coverage for a

hypothesis like “DL-RW performs equally well as the random walk for maturities of τ

months at a horizon of h months.” The problem of testing multiple hypotheses naturally

arises in our context, since we are interested in the performance at different maturities and

over different forecast horizons. This is one area where an extension could be valuable:

Controlling the family-wise error rate and the false discovery rate is necessary for correct

inference.

4.5 Results for the US

The sample choice of DL is to take Jan-1985 to Jan-1994 as the initial estimation window,

with data available until Dec-2000. We initially chose that exact same sample for our

forecasting exercise. For short forecast horizons, we obtain numerically similar results to

DL – their method does not fare that well. For longer horizons, in particular 12 months,

where DL find that their method significantly outperforms, we find neither quantitatively

nor qualitatively similar results. DL-AR does not significantly outperform the RW at any

maturity considered by DL. Using exactly the same sample choice, the DL strategy does

not fare as well in our data set as in the original paper.

The full sample, Jan-1985 to Dec-2006, was split in half to obtain the initial estimation

window, so T = 264 and R = P = 132. Results in this sample are the ones we actually

present in the tables. They provide new evidence whether the different DL specifications

perform well, as compared to the no-change forecast for each yield.

The results of our forecast exercise for a horizon of h = 1 month are presented in table

4. We forecast yields at maturities 3, 12, and 60 months. The columns show the mean

and standard deviation of the forecast errors, the root MSE, autocorrelations at one and

twelve months lags, as well as the Diebold-Mariano test statistic assessing the hypothesis

of equal forecast performance.5 For the short rate, the CF methods do best and have

MSEs that are significantly smaller than that of the RW. For the one year yield, no model

outperforms the random walk, and in fact the DL strategies perform significantly worse.

For the five year yield, again no model outperforms the random walk. Forecasting yields

one month ahead better than with the naive no change forecast seems difficult, yet forecast

combinations perform comparatively well.

In table 5 results for the six months forecast horizon are shown. Again we can only beat

the RW for the short rate, with the DL strategy employing RW factors, and with the CF

5The slope regression method is not included for τ = 3 since it is not applicable.

11

strategies. It is noteworthy that the DM statistic for the CF strategy using performance

weights is again strongly significant. For other maturities than 3 months, no model can

significantly outperform the no change forecast, but DL-RW and combined forecasts have

smaller MSE than the RW for the 5y yield.

Finally table 6 shows the results for the year-ahead forecasts. Only for the 5 year yield

can one model, DL with RW factors, outperform the RW. The CF methods have smaller

MSE then the RW for the 3m and 5y maturities, yet not significantly so. Noteworthy is

the better performance of DL-RW compared to DL-AR at all three maturities.

In sum we find markedly different results than DL: The DL-AR strategy never beats

the RW, and is often significantly worse. There are two reasons for this: First the data

has been constructed in a different way, and the results are obviously sensitive to this.

Secondly, on the data from 2000 to 2006, which is included in our sample but not in DL’s,

the DL strategy performs particularly bad – the results deteriorate from the DL sample

choice to the full sample. What should certainly be pointed out is the fact that the DL-

RW performs remarkably better than DL-AR. The former strategy has smaller RMSE in

all of the nine settings considered, and it beats the RW in two cases, whereas the latter

strategy never does. Today’s factor values seem to be decidedly better forecasts for future

factors than an AR(1) forecast. Intuitively, the factors are so close to martingales that

the added estimation uncertainty from estimating the AR(1) coefficients leads to worse

forecasts. So a forecast using NS-factors should, according to our evidence, best be done

by just predicting today’s fitted yield.

The DL-RW method fares well in comparison to the RW. In only three out of nine cases

does it have a larger MSE. Twice it beats the RW significantly. So there are obviously

gains to be had from the DL-forecast methodology. The information in the factors contains

less noise than the individual yields, so one is well advised to base a yield forecast at least

partly on the information in these factors.

The combination of forecasts shows its merit also in our study. Although the CF

strategies do not beat the RW throughout, their RMSE is mostly smaller. In particular

for the short rate they outperform remarkably (yet not significantly so for the year-ahead

forecasts). The combined forecasts are particularly good if at least some of the individual

forecasts have lower MSE than the RW. The four different DL-specifications that we

included have mostly one or two among them that performs well. Therefore they contribute

to the good performance of the combined forecasts.

In addition to the previous results, we look at the DL-RW strategy from another

perspective. How does it fare in comparison with it’s shrinkage version? What we call

shrinkage here is really a simple average between last periods yield and last periods fitted

12

yield. This is also a combined forecast, a particularly simple one. In table 7 we see that

our hypothesis that a DL-model is usually outperformed by a combination of itself with

the previous period’s yield. At all maturities and forecast horizons is the MSE smaller for

the shrinkage specification. Again, this is because yields and pricing errors are persistent

and the shrinkage method takes advantage of this fact.

5 Further Directions

In the following we propose various extensions to the analysis of this paper.

As mentioned earlier, yield forecast methods are compared at different horizons and

maturities. Therefore we run into the problem of multiple hypothesis testing. Since

many hypothesis tests (for equal forecast accuracy) are performed at once, one should

control the false discovery rate (Benjamini and Hochberg, 1995). Whenever we want to

aggregate the evidence, and not only forecast one yield over one horizon forecast horizon,

our inference will be incorrect without appropriately controlling the coverage of our tests.

One method that seems promising has been proposed by Storey (2002): These authors fix

the rejection region and then estimate its corresponding error rate, which “offers increased

applicability, accuracy and power”. Whichever method we choose, it will enable us to test

hypotheses about the relative performance of two methods over several forecast horizons

and/or maturities. This will allow for more concrete conclusions than just eyeballing

several DM test results and aggregating them only verbally.

Estimating the contribution of the factor prediction errors and of the pricing errors

to the MSE of the forecast, as well as the correlation between the two would certainly

be interesting. We could then understand how important the pricing errors are in the

forecast error variance, and compare different competitors in the DL class with regard to

estimates of their Vf and of the correlation between factor prediction errors and pricing

errors. Furthermore the combination of DL-forecasts with forecasts that do not rely on NS-

factors, as shown, generally improves performance, because it reduces the pricing errors.

Further investigations into this direction seem promising. In particular, a weighting scheme

for combining DL-forecasts and individual yield forecasts that do not rely on NS-factors)

should take into account how important pricing errors have been in the past compared

to forecast prediction errors. If pricing errors are relatively important, individual yield

forecasts should be weighed more strongly. It might be possible to derive ex-ante optimal

weights for this problem, either time-constant or time-varying, which would be a very

fruitful extension, and could be empirically promising.

Other obvious extensions include attempts to forecast the factors better using either

13

univariate (e.g. ARFIMA) or multivariate models. The key here is a parsimonious config-

uration. Also one could include more information. Important sources of information could

be among others macro factors, international yield curve factors, risk measures (e.g. ex-

pected risk premia), volatility measures, and latent factors. The task is then to find ways

to incorporate relevant information without introducing too much estimation uncertainty.

For linear factor forecasts, a possibility to include other information are transfer functions

(see for example Liu, 2006, chap.5), which are just highly restricted VARs. They could

provide a reasonable middle-ground between too much additional estimation uncertainty

and ignoring possibly valuable information.

Constructing reliable yield curve data for many countries, and testing newly developed

forecast strategies in new data is an important further task. If we keep improving on our

strategies and test them repeatedly in the same data, we obviously run into the problem

of datasnooping. Extending available data sources is therefore important.

Any research program that aims at improving term structure forecasts should not be

oblivious to the advantages the DL factor approach brings with it. The above extensions

could possibly enable applied researchers to profit from this approach considerably.

6 Conclusion

Forecasting the yield curve accurately is a difficult task. Among the competing forecast

models considered in this paper, none could consistently beat the random walk. However,

at almost all maturities and forecast horizons, at least some of the models perform well.

This gives particular appeal to combined forecasts, which profit from this fact. They

mostly have smaller MSE than the RW, and sometimes beat it significantly.

With regard to the DL strategy, we cannot confirm the positive results in the original

paper. The method DL-AR by itself does worse in our data, in particular in the full

sample. There is light on the horizon though: Getting rid of the estimation uncertainty

and simply modeling some or all factors as martingales improves upon the performance

of the DL method. Moreover, there is mostly at least one of the alternative models that

performs well. Therefore they are very valuable to include in combined forecasts.

The relatively good performance of the DL-RW specification indicates that the ap-

proach to forecast yields via a fitted yield curve is promising. From our results we conclude

that trying to forecast the factors can be counterproductive. The added estimation un-

certainty is a candidate explanation for why today’s fitted yield curve is a better forecast

than the one based on forecasted factors.

14

References

Benjamini, Yoav and Yosef Hochberg, “Controlling the False Discovery Rate: A

Practicaul and Powerful APproach to Multiple Testing,” Journal Of The Royal Statis-

tical Society Series B, 1995, 57 (1), 289–300.

Bliss, Robert R., “Testing term structure estimation methods,” Working Paper 96-12,

Federal Reserve Bank of Atlanta 1996.

Clemen, Robert T., “Combining forecasts: A review and annotated bibliography,”

International Journal of Forecasting, 1989, 5 (4), 559–583.

Diebold, Francis X. and Canlin Li, “Forecasting the Term Structure of Government

Bond Yields,” Journal of Econometrics, February 2006, 130 (2), 337–364.

Diebold, Francis X and Roberto S Mariano, “Comparing Predictive Accuracy,”

Journal of Business & Economic Statistics, July 1995, 13 (3), 253–63.

Diebold, Francis X., Canlin Li, and Vivian Z. Yue, “Global Yield Curve Dynamics

and Interactions: A Generalized Nelson-Siegel Approach,” Manuscript, Department of

Economics, University of Pennsylvania June 2006.

, Glenn D. Rudebusch, and S. Boragan Aruoba, “The Macroeconomy and the

Yield Curve: A Dynamic Latent Factor Approach,” Journal of Econometrics, March-

April 2006, 131 (1-2), 309–338.

Duffee, Gregory R., “Term Premia and Interest Forecasts in Affine Models,” Journal

of Finance, February 2002, 57 (1), 405–443.

Efron, Bradley, “Estimating the Error Rate of a Prediction Rule: Improvement on

Cross-Validation,” Journal of the American Statistical Assocation, June 1983, 78 (382),

316–331.

Fama, Eugene F. and Robert R. Bliss, “The Information in Long-Maturity Forward

Rates,” American Economic Review, September 1987, 77 (4), 680–692.

Jeffrey, Andrew, Oliver Linton, and Thong Nguyen, “Flexible Term Structure

Estimation: Which Method is Preferred?,” Metrika, March 2006, 63 (1), 99–122.

Liu, Lon-Mu, Time Series Analysis and Forecasting, 2nd ed., Scientific Computing As-

sociates Corp., 2006.

15

Storey, John D., “A direct approach to false discovery rates,” Journal Of The Royal

Statistical Society Series B, 2002, 64 (3), 479–498.

Timmermann, Allan, “Forecast Combinations,” 2005. Forthcoming in Handbook of

Economic Forecasting.

A Data

A.1 Data Source

Our data source for monthly data on US government bonds is the CRSP Monthly US

Treasury Database6. We use the cross-sectional file, which includes monthly data on all

outstanding Treasury bills, notes and bonds. In particular, all dead bonds that have long

been redeemed are also available, with the same data quality as today’s issues. Our sample

includes all observations from January 1985 to December 2006.

A.2 Filters

We include only non-callable, fully taxable, non-flower bonds, since the pricing of non-

standard issues deviates from the usual well-known bond pricing theory. The relevant

price variable is the mean of bid and ask price. These are flat prices, that is they do not

include accrued interest. We call our mean price simply the price and the mean price plus

accrued interest the cash price.

First we exclude those quotations where the price is lower than 50 or higher than 130,

since issues with discounts/premiums of that magnitude usually show thin trading and

the prices are therefore subject to idiosyncratic variations.

We exclude quotations where the yield differs significantly from the yield at nearby

maturities: We generate two moving averages, one including the three issues of shorter

maturity, and one using the three issues of longer maturity, and include an issue only if it

is within .2 percentage points of either moving average or within the two. This procedure

is an adapted (simplified) version of the methodology employed by CRSP to construct the

Fama-Bliss files.

Also excluded from the analysis are all issues with maturity of one month or less or 15

years or more, since again, there is thin trading for these issues.

Our filtered bond price data includes the following variables:

6Source: CRSP, Center for Research in Security Prices. Graduate School of Business, The Universityof Chicago 2007. Used with permission. All rights reserved. www.crsp.uchicago.edu

16

• Date of quotation, date of maturity, date of first coupon payment, days to maturity.

• Coupon rate, value of first coupon, accrued interest.

• Price, yield to maturity (annualized).

The date of the first coupon payment starts the semiannual cycle of coupon payments.

All coupon payments are exactly half the coupon rate times the face value ($100), except

for possibly the first coupon payment, in case it occurred not exactly half a year after the

date the issue was dated by the Treasury.

A.3 Bootstrapping the Zero Curve

Although the CRSP Monthly Treasury Database includes Fama-Bliss yields, these are

only available at maturities from one to five years, so that we had to construct Fama-Bliss

yields ourselves from the available bond price data. In the following we briefly outline our

algorithm.7

The underlying pricing assumption is that the daily forward rates are constant between

two successive maturities. The forward rate function is therefore a step function with

jumps at the maturities of the available issues. For each point in time, the issue with

the shortest maturity starts the iteration. If it is a discount bond, the forward rate

follows easily from the formula relating the cash price and the forward rate: pcash =

100 exp(−τ1F1), where τ1 is the maturity of the first issue. If the first issue is a coupon

bond, the forward rate is its yield to maturity, which we take from the data source. Now

for each successive issue, the forward rate is calculated so that, given previous forward

rates, it exactly prices that issue. Since coupons are paid at half-year intervals, and the

difference in maturities of two successive issues is always smaller than half a year, there

is only one cash-flow that is discounted using that forward rate, and there is a simple

closed form solution. If there is more than one issue at a particular maturity, we calculate

a forward rate for each of them, and then average these forward rates. This is common

practice. It should be noted that in these cases the bonds are naturally not exactly priced

by the averaged forward rate.

After bootstrapping the spot rates for all maturities at each point in time, we pool

these rates into fixed maturities using linear interpolation. The fixed maturities are

3,6,9,12,15,18,21,24,30,36,48,60,72,84,96,108 and 120 months. In some cases there is no

issue with a maturity of at least 120 months so we extrapolate the spot rate of earlier

maturities.

7A lucid explanation of the methodology can be found in Jeffrey et al. (2006).

17

A.4 Data for Other Countries: UK, Germany, Japan

Our initial goal was to compare the forecast performance of the Diebold-Li approach

across different countries: US, UK, Germany, and Japan. Data on government bonds

for all of these countries is available via Thomson Datastream. We downloaded the data

from Datastrom using Excel, imported it into Stata, and developed algorithms to bring

it into a format that is susceptible to analysis in Matlab. The documentation and data

quality, in particular on dead government bonds, is much worse in Datastream than in the

CRSP Monthly Treasury Database. For example, convertible bonds and callable bonds are

not reliably marked as those, and quotations are sometimes ex-dividend (so that accrued

interests are negative).

We ran the same filters as for the CRSP data, and since type-of-issue indicators (like

convertible, tax-free, callable) were unreliable, excluded issues with names including “con-

version” and other obscure names (e.g. “paid”). Those issues were usually associated with

markedly different pricing. After the initial data processing, we attempted to bootstrap

the spot rate curve from the bond prices. A difficult issue was that the accrued interest

quotation convention is not standardized across countries. Therefore we calculated ac-

crued interest ourselves. Yet we were in the end not able to confirm consistency between

YTM and cash prices, an important check before even proceeding to the bootstrap. As

expected, the zero curves that we extracted from the bond prices were badly behaved and

inconsistent, with large outliers and occasionally negative yields. Given the available time,

we were unfortunately not able to extract meaningful and consistent zero curves from the

Datastream data.

B The Kalman Filter

The Diebold-Li forecast approach can be cast into a state-space representation, as detailed

in Diebold et al. (2006b). This allows to include other factors into the dynamics of the

NS-factors, and provides correct inference about estimation of parameters of the factor

dynamics (as opposed to the two step estimation of DL, where first factors and then their

dynamics are estimated).

During our attempt to tackle the forecasting exercise using a state-space representation

throughout, several issues came up and lead to a decision in favor of the simple two step

approach.

• The estimation procedure is much more complex, since numerical optimization has

to be employed to maximize the likelihood function. The results are sensitive to

18

initial values, and (naturally) to the restrictions imposed. This complexity does not

pay off in the forecast exercise.

• Using a rolling or recursive forecast scheme, the computational costs quickly become

very large.

• We are not interested in the inference about the factor dynamics, but in the inference

about the forecast performance. Therefore the correct inference about the estimation

of the dynamic process for the factors, which is possible with the one-step estimation,

is not helpful in our case.

• The state-space approach requires estimating variances of the measurement and

transition equations. Several different restrictions are possible (and some necessary

for computational tractability) on these covariance matrices. Again, this adds un-

necessary complexity if the task is simply to forecast the factors and yields.

We will investigate the estimation of the factors and their dynamics via state-space

models further in the future, since it does provide some advantages if one is willing to

pay the price of considerable additional complexity: Including macro factors in the dy-

namic system, or global yield curve factors (Diebold et al., 2006a) might improve forecast

performance. Also, heteroskedasticity and missing observations can be dealt with.

C Stata and Matlab Code

In the following we give an overview of the most important code modules we developed

during the course of this project. Table 8 lists and describes the do-files that were written

to facilitate data processing. Table 9 lists and describes the most important Matlab

scripts that were written to carry out the analysis. Various further supporting functions

were written in Matlab which, for sake of brevity, are not included in these tables.

19

Maturity Mean Std.Dev. Min. Max. ρ(1)3 4.8162 2.0117 0.8148 9.1087 0.99216 4.9699 2.0379 0.9443 9.4404 0.99249 5.1130 2.0696 0.9781 9.5942 0.991512 5.2088 2.0721 1.0393 9.6742 0.990215 5.3124 2.0782 1.0657 9.9827 0.989818 5.3969 2.0628 1.1436 10.1823 0.989421 5.4687 2.0405 1.2187 10.2632 0.988824 5.5139 2.0110 1.2990 10.4049 0.987830 5.6608 1.9864 1.4443 10.7367 0.987036 5.7709 1.9439 1.6173 10.7781 0.986248 5.9824 1.9065 1.9962 11.2589 0.984360 6.0994 1.8516 2.3482 11.3029 0.984572 6.2623 1.8490 2.6606 11.6440 0.984884 6.3534 1.7986 2.9993 11.8313 0.984796 6.4446 1.7690 3.2172 11.5174 0.9843108 6.5048 1.7726 3.3858 11.7241 0.9856120 (Level) 6.4592 1.7325 3.4678 11.6604 0.9839Slope 1.6430 1.2574 -0.8975 3.9835 0.9698Curvature -0.2477 0.7593 -2.1724 1.5963 0.9217

Table 1: Summary statistics, US term structure, Jan. 85 - Dec. 06

Factor Mean Std.dev. Min. Max. ρ(1) ADFβ1t 6.9092 1.7126 3.8639 12.1111 0.9837 -2.3358β2t -2.1774 1.7239 -5.5527 1.0354 0.9780 -1.8491β3t -0.7766 2.0031 -5.9966 4.1809 0.9264 -3.2295

Table 2: Summary statistics and unit-root tests for the NS-factors

Maturity Mean Std.Dev. Min. Max. MAE RMSE ρ(1)3 -0.0401 0.0997 -0.5010 0.2471 0.0805 0.1073 0.75366 -0.0038 0.0485 -0.1228 0.2956 0.0351 0.0485 0.22799 0.0289 0.0654 -0.2019 0.2666 0.0522 0.0714 0.554912 0.0213 0.0641 -0.1810 0.2221 0.0536 0.0674 0.689515 0.0282 0.0505 -0.1596 0.1999 0.0463 0.0578 0.725318 0.0225 0.0356 -0.0955 0.1015 0.0349 0.0420 0.624521 0.0102 0.0293 -0.0981 0.1060 0.0239 0.0310 0.465224 -0.0230 0.0434 -0.2164 0.0717 0.0357 0.0490 0.603630 -0.0167 0.0384 -0.2022 0.1516 0.0318 0.0418 0.572636 -0.0281 0.0516 -0.1879 0.1646 0.0466 0.0587 0.734148 -0.0124 0.0635 -0.1878 0.2032 0.0506 0.0646 0.712060 -0.0425 0.0578 -0.1774 0.2095 0.0594 0.0716 0.633872 0.0087 0.0693 -0.1276 0.3643 0.0481 0.0697 0.877284 0.0134 0.0554 -0.2388 0.2946 0.0387 0.0569 0.619996 0.0369 0.0437 -0.1596 0.1627 0.0467 0.0572 0.7707108 0.0430 0.0427 -0.0772 0.1966 0.0498 0.0605 0.6706120 -0.0466 0.1003 -0.5511 0.1370 0.0790 0.1104 0.8579

Table 3: Summary statistics, yield curve residuals

Method Mean Std.Dev. RMSE ρ(1) ρ(12) DMτ = 3 months

Diebold-Li, AR(1) factors -0.0786 0.1933 0.2080 0.2791 -0.0699 -0.0928Diebold-Li, RW factors -0.0366 0.1901 0.1929 0.2519 -0.0706 -1.6078Diebold-Li, RW×2/AR(1)×1 -0.0383 0.1931 0.1962 0.2728 -0.0796 -1.3631Diebold-Li, RW×1/AR(1)×2 -0.0388 0.1925 0.1956 0.2759 -0.0437 -1.5612Random walk -0.0006 0.2097 0.2089 0.3012 0.1168AR(1) for yield levels 0.0052 0.2109 0.2102 0.3144 0.1126 1.0777AR(1) for yield changes 0.0203 0.2017 0.2019 0.0941 0.1082 -1.4723Fama-Bliss 0.0543 0.1871 0.1942 0.2631 0.0791 -1.6645Principal components -0.0258 0.1955 0.1965 0.2942 0.0260 -1.9504CF, equal weights -0.0154 0.1925 0.1924 0.2413 0.0177 -3.0409CF, perf. weights -0.0165 0.1921 0.1921 0.2344 0.0110 -2.9190

τ = 12 monthsDiebold-Li, AR(1) factors -0.0197 0.2532 0.2530 0.4765 0.1480 2.7697Diebold-Li, RW factors 0.0253 0.2360 0.2365 0.3631 0.1081 2.0668Diebold-Li, RW×2/AR(1)×1 0.0205 0.2495 0.2494 0.4580 0.1352 3.1367Diebold-Li, RW×1/AR(1)×2 0.0201 0.2563 0.2561 0.4853 0.1705 3.4298Random walk -0.0011 0.2252 0.2243 0.2644 0.0123AR(1) for yield levels -0.0022 0.2274 0.2266 0.2793 0.0180 1.4805AR(1) for yield changes 0.0240 0.2181 0.2186 0.0187 0.0082 -0.8586Slope regression 0.0328 0.2232 0.2248 0.2147 0.0103 0.1127Fama-Bliss 0.0643 0.2204 0.2287 0.1783 0.0213 0.4759Principal components -0.0106 0.2345 0.2338 0.3322 0.0920 1.6177CF, equal weights 0.0153 0.2293 0.2290 0.2994 0.0662 1.2815CF, perf. weights 0.0179 0.2279 0.2277 0.2807 0.0532 1.0222

τ = 60 monthsDiebold-Li, AR(1) factors -0.0891 0.2818 0.2945 0.0999 -0.0219 0.7527Diebold-Li, RW factors -0.0441 0.2809 0.2833 0.0527 -0.0272 -0.7370Diebold-Li, RW×2/AR(1)×1 -0.0491 0.2823 0.2855 0.0936 -0.0173 -0.2260Diebold-Li, RW×1/AR(1)×2 -0.0493 0.2833 0.2865 0.0978 -0.0094 -0.0743Random walk -0.0052 0.2881 0.2870 0.0704 -0.0022AR(1) for yield levels -0.0393 0.2879 0.2895 0.0802 0.0022 0.6287AR(1) for yield changes 0.0261 0.2890 0.2891 -0.0559 0.0102 0.4312Slope regression 0.0154 0.2892 0.2886 0.0818 -0.0100 0.5558Fama-Bliss 0.0380 0.2894 0.2908 0.0639 0.0068 0.8896Principal components -0.0224 0.2813 0.2811 0.0798 -0.0351 -1.2700CF, equal weights -0.0219 0.2833 0.2831 0.0566 -0.0113 -1.3693CF, perf. weights -0.0218 0.2842 0.2840 0.0552 -0.0113 -1.1154

Table 4: Performance of one-month-ahead forecasts


Diebold-Li, AR(1) factors -0.3470 0.8396 0.9054 0.6172 -0.0863 1.4891Diebold-Li, RW factors -0.0385 0.7709 0.7688 0.5104 -0.1408 -2.1124Diebold-Li, RW×2/AR(1)×1 -0.0505 0.7938 0.7923 0.5105 -0.1498 -0.5913Diebold-Li, RW×1/AR(1)×2 -0.0432 0.8575 0.8552 0.6409 -0.0021 1.0289Random walk -0.0053 0.8048 0.8016 0.5431 -0.0781AR(1) for yield levels -0.0677 0.8281 0.8276 0.6029 -0.0511 0.7917AR(1) for yield changes 0.0615 0.7100 0.7099 0.2725 -0.1422 -1.4233Fama-Bliss 0.2863 0.7323 0.7835 0.5344 -0.0364 -0.2332Principal components -0.1674 0.7875 0.8020 0.6294 -0.0229 0.0073CF, equal weights -0.0413 0.7679 0.7660 0.5445 -0.0845 -1.8778CF, perf. weights -0.0132 0.7595 0.7566 0.5335 -0.0817 -2.5483

τ = 12 monthsDiebold-Li, AR(1) factors -0.3255 0.9102 0.9632 0.6702 -0.0198 1.7219Diebold-Li, RW factors 0.0064 0.8173 0.8141 0.5326 -0.0690 1.0007Diebold-Li, RW×2/AR(1)×1 -0.0273 0.8671 0.8641 0.6014 -0.0628 1.7920Diebold-Li, RW×1/AR(1)×2 -0.0216 0.9430 0.9396 0.6737 0.0544 1.8478Random walk -0.0188 0.8057 0.8027 0.5091 -0.0999AR(1) for yield levels -0.1248 0.8354 0.8414 0.5914 -0.0598 0.9063AR(1) for yield changes 0.0668 0.7452 0.7453 0.2987 -0.1690 -1.2765Slope regression 0.1869 0.8039 0.8223 0.4545 -0.0926 0.4804Fama-Bliss 0.1198 0.8480 0.8532 0.5158 -0.0599 2.0753Principal components -0.1624 0.8702 0.8818 0.6218 -0.0095 1.3180CF, equal weights -0.0301 0.8201 0.8174 0.5480 -0.0637 0.6904CF, perf. weights -0.0138 0.8103 0.8072 0.5428 -0.0681 0.2564

τ = 60 monthsDiebold-Li, AR(1) factors -0.4364 0.6890 0.8133 0.2037 -0.1692 1.4308Diebold-Li, RW factors -0.0992 0.6989 0.7032 -0.0499 -0.2004 -1.3044Diebold-Li, RW×2/AR(1)×1 -0.1348 0.6952 0.7055 0.1337 -0.1445 -0.3385Diebold-Li, RW×1/AR(1)×2 -0.1326 0.7285 0.7376 0.1884 -0.0875 0.5077Random walk -0.0596 0.7159 0.7155 -0.0280 -0.1926AR(1) for yield levels -0.3393 0.6932 0.7693 0.1139 -0.1668 0.9695AR(1) for yield changes 0.0940 0.7322 0.7353 -0.0305 -0.2000 0.7273Slope regression 0.0582 0.7372 0.7366 0.0723 -0.2003 0.6282Fama-Bliss 0.0694 0.7472 0.7475 0.0163 -0.2088 1.1200Principal components -0.2530 0.6740 0.7174 0.0857 -0.1946 0.0490CF, equal weights -0.1233 0.6960 0.7042 0.0416 -0.1899 -0.7840CF, perf. weights -0.1178 0.6887 0.6960 0.0324 -0.1899 -1.3672

Table 5: Performance of six-month-ahead forecasts


Diebold-Li, AR(1) factors -0.7503 1.4445 1.6224 0.3464 -0.2302 0.8095Diebold-Li, RW factors -0.0583 1.4221 1.4174 0.1912 -0.1360 -1.0574Diebold-Li, RW×2/AR(1)×1 -0.0797 1.4330 1.4293 0.1859 -0.1438 -0.4434Diebold-Li, RW×1/AR(1)×2 -0.0583 1.5366 1.5313 0.4574 -0.1920 0.4564Random walk -0.0293 1.4448 1.4391 0.2399 -0.1514AR(1) for yield levels -0.3799 1.4685 1.5109 0.3836 -0.2243 0.4870AR(1) for yield changes 0.1006 1.4788 1.4761 0.1517 -0.1329 0.7043Fama-Bliss 0.4055 1.4434 1.4936 0.3098 -0.1931 0.4384Principal components -0.4165 1.4463 1.4993 0.4119 -0.2008 0.3468CF, equal weights -0.1407 1.4053 1.4065 0.2885 -0.1916 -0.5146CF, perf. weights -0.0820 1.4096 1.4061 0.3194 -0.1963 -0.6105

τ = 12 monthsDiebold-Li, AR(1) factors -0.7708 1.4583 1.6441 0.3883 -0.2501 0.8627Diebold-Li, RW factors -0.0353 1.4302 1.4247 0.2670 -0.1990 0.3973Diebold-Li, RW×2/AR(1)×1 -0.0954 1.4466 1.4437 0.2954 -0.2225 0.3992Diebold-Li, RW×1/AR(1)×2 -0.0788 1.5668 1.5623 0.4625 -0.1982 0.6371Random walk -0.0586 1.4231 1.4185 0.2435 -0.1893AR(1) for yield levels -0.4729 1.4701 1.5385 0.3942 -0.2511 0.6570AR(1) for yield changes 0.0592 1.4834 1.4784 0.2495 -0.2035 1.7483Slope regression 0.2950 1.4696 1.4929 0.2410 -0.2016 0.8343Fama-Bliss 0.2170 1.5061 1.5155 0.2721 -0.1909 1.3848Principal components -0.4602 1.5012 1.5642 0.4216 -0.2302 0.7422CF, equal weights -0.1401 1.4285 1.4295 0.3148 -0.2247 0.1490CF, perf. weights -0.0852 1.4292 1.4258 0.3327 -0.2423 0.1164

τ = 60 monthsDiebold-Li, AR(1) factors -0.9261 0.9227 1.3046 0.0173 -0.1730 1.6833Diebold-Li, RW factors -0.1768 0.9712 0.9832 -0.1768 -0.0747 -1.7332Diebold-Li, RW×2/AR(1)×1 -0.2403 0.9573 0.9832 -0.0205 -0.1452 -0.4089Diebold-Li, RW×1/AR(1)×2 -0.2341 1.0528 1.0743 0.0454 -0.0814 0.6023Random walk -0.1376 0.9994 1.0048 -0.1677 -0.0695AR(1) for yield levels -0.7332 0.9467 1.1944 0.0360 -0.1351 1.1771AR(1) for yield changes 0.0642 1.0209 1.0187 0.1082 -0.2409 0.1926Slope regression 0.0953 1.0679 1.0677 -0.0121 -0.1290 0.7539Fama-Bliss 0.1253 1.0999 1.1025 -0.0893 -0.0710 1.2509Principal components -0.6546 0.9330 1.1366 0.0454 -0.1725 0.9332CF, equal weights -0.2818 0.9629 0.9995 -0.0549 -0.1321 -0.1122CF, perf. weights -0.2359 0.9522 0.9772 -0.0713 -0.1516 -0.8154

Table 6: Performance of twelve-month-ahead forecasts

Maturity h = 1 h = 6 h = 12DL-RW DL-RWS DL-RW DL-RWS DL-RW DL-RWS

τ = 3 -1.6078 -2.6504 -2.1124 -2.2599 -1.0574 -1.12576 2.0583 1.4747 1.9878 1.8932 1.2586 1.20039 1.0717 0.1421 0.3481 0.2165 0.1665 -0.214112 2.0668 1.3091 1.0007 0.8895 0.3973 0.351224 2.2482 2.0024 1.0362 0.9909 1.1089 1.088736 -1.0325 -1.5159 -1.2929 -1.3990 -0.8712 -0.913260 -0.7370 -1.3395 -1.3044 -1.4262 -1.7332 -1.7956120 1.8357 0.6831 1.3764 1.1322 1.9675 1.8284

Table 7: Comparison of DL-RW to its shrinkage version DL-RWS

Module Functionalityus-bonds.do

Preparation of CRSP data: filtering, reformatting,consistency checks, export to csv-format

prepare datastream issues.do

Preparation of Datatream data on issues: analysis,filtering, reformatting, export to csv-format

prepare datastream data.do

Preparation of Datatream data on prices, ytm, and accrued interest:reshaping, reformatting, export to csv-format

prepare complete.do

Preparation of consolidated Datatream data: merging, reshaping,reformatting, filtering, consistency checks, export to csv-format

create reformat date.do

Function to convert date-strings into Stata readable format

Table 8: Stata do-files

Module Functionalitydata prep/us data prep.m

Construction of the zero curve and forward rates for the US,using CRSP data

data prep/uk data prep.m

Construction of the zero curve and forward rates for the UK,using Datastream data (does not produce consistent results)

data summary.m

Summary statistics for the zero curvedl create factors etc.m

Creation of NS-factors, comparison of NS-factors and empirical factors,fitted and empirical yield curves, graphing of spot rate and residuals

dl factor dynamics.m

Analysis of dynamic properties of the factors, ACF/PACFs,ARMA-modelling, residual autocorrelations, tests for white noise

dl forecast.m

Out-of-sample forecasting and comparison of predictive accuracykalman.m

Estimation of state-space modelkalman loglik.m

Implementation of the Kalman filter

Table 9: Matlab modules

1982

1993

2004

0

50

100

1500

2

4

6

8

10

12

TimeMaturity (months)

Yie

ld (

perc

ent)

Figure 1: Yield curves, Jan. 85 - Dec. 06

87 93 98 042

4

6

8

10

12

14Level

Model−based levelEmpirical level

87 93 98 04−2

0

2

4

6Slope

−(model−based slope)Empirical slope

87 93 98 04−3

−2

−1

0

1

2Curvature

0.3(model−based curvature)Empirical curvature

Figure 2: NS-factors vs. empirical factors

0 10 20 30 40−1

0

1

ACF, β1

0 10 20 30 40−1

0

1

PACF, β1

0 10 20 30 40−1

0

1

ACF, β2

0 10 20 30 40−1

0

1

PACF, β2

0 10 20 30 40−1

0

1

ACF, β3

0 10 20 30 40−1

0

1

PACF, β3

Figure 3: Autocorrelations and partial autocorrelations of NS-factors

0 5 10 15 20 25 30 35 40−0.5

0

0.5

1

Residuals of AR(1) for β1

0 5 10 15 20 25 30 35 40−0.5

0

0.5

1


0 5 10 15 20 25 30 35 40−0.5

0

0.5

1


Figure 4: Autocorrelations of the residuals of AR(1) models for the NS-factors

0 20 40 60 80 100 120

5

5.2

5.4

5.6

5.8

6

6.2

6.4

6.6

Maturity (months)

Yie

ld (

perc

ent)

empiricalfitted

Figure 5: Average fitted vs. average empirical yield curve

0 20 40 60 80 100 1208.8

9

9.2

9.4

9.6

9.8Yield curve on 1989−03−31

Maturity (months)

Yie

ld (

perc

ent)

empiricalfitted

0 20 40 60 80 100 1207.3

7.4

7.5

7.6

7.7

7.8


Maturity (months)

Yie

ld (

perc

ent)

empiricalfitted

0 20 40 60 80 100 1205

5.2

5.4

5.6

5.8

6

6.2

6.4

6.6


Maturity (months)

Yie

ld (

perc

ent)

empiricalfitted

0 20 40 60 80 100 120

4.85

4.9

4.95

5

5.05

5.1


Maturity (months)

Yie

ld (

perc

ent)

empiricalfitted

Figure 6: Fitted vs. empirical yield curves on selected dates

1982

1993

2004

0

50

100

150−0.4

−0.2

0

0.2

0.4

TimeMaturity (months)

Yie

ld (

perc

ent)

Figure 7: Yield curve residuals (pricing errors)

220 F

Business

yield forecasts

forecast exercise

forecast horizons

forecast combinations

latent factors

modelthe factors

nelsonsiegel factors

nelson siegel factors