Statistics Tutorial, Part 6.
Time Series Analysis

(c) 2017 by Barton Paul Levenson



Regression is a powerful tool. It can be more powerful still when used on measurements taken over equal intervals of time, such as inflation rates, unemployment rates, or levels of carbon dioxide in the atmosphere. Such time series data help us address which variable(s) actually cause another. (With many caveats, of course.)

For a time series regression, we can add two more statistics to a regression report: DW and h. DW is the Durbin-Watson statistic, named for the statisticians who proposed it in 1951. It can range from 0 to 4. Values around 2 are best. If DW is too far above or below 2, you have "serial correlation in the residuals."

"Residuals" is just a fancy term for "errors." The whole phrase means the errors show systematic patterns, instead of just being all over the place. If positive errors (too-high estimates) tend to follow other positive errors, and negative errors follow negative, you have positive serial correlation in the residuals. If positive tends to follow negative and vice versa, you've got negative serial correlation in the residuals. Either one is bad. A regression with this problem may be a completely unreliable "spurious regression!"

And that means you've got a bad model. You're missing one or more important variables, and your values of R2, R̄2, F, and t may be unreliable and misleading, no matter how good they look. You can have better than 90% R2, and it will be worthless if your DW is too extreme.

The h, known as "Durbin's h," comes into play only when you have an "autoregression term." This means a lagged version of the dependent variable, used as an independent variable. For example, if I were to regress dp on dp the previous year, the lagged dp would be an autoregression term. These terms can be very useful, but they destroy the Durbin-Watson statistic. DW often comes out near 2 when you use autoregression terms, but the regression can still be spurious.

That's where Durbin's h comes in. "Good" values of h are small. Usually if they are less than about 2 (or greater than -2, if negative), you have no serious autocorrelation and your regression may be okay. Large values of h are warning signs.

Let's try an example. We will try to predict the US inflation rate using the federal deficit, and this time we will use real data. To keep the measurements comparable, we will measure the federal balance (B) as a percentage of GDP, and inflation (dP) as the annual increase in the GDP deflator. Negative values of B represent deficits, positive values are surpluses. I use 30 years worth of data, which should be enough to suggest a relationship if one is there.

Here are the data:


YeardPB
19872.6-3.1
19883.5-3.0
19893.9-2.7
19903.7-3.7
19913.3-4.4
19922.3-4.4
19932.4-3.7
19942.1-2.8
19952.1-2.1
19961.8-1.3
19971.7-0.3
19981.1 0.8
19991.5 1.3
20002.3 2.3
20012.3 1.2
20021.5-1.4
20032.0-3.3
20042.7-3.4
20053.2-2.4
20063.1-1.8
20072.7-1.1
20082.0-3.1
20090.8-9.8
20101.2-8.6
20112.1-8.4
20121.8-6.7
20131.6-4.1
20141.8-2.8
20151.1-2.4
20161.3-3.2

And here's the regression:


dP = f(B)N = 30
R2 = 0.015462 = -0.01970
F1,28 = 0.4397 (p < 0.5127)SEE = 0.8152
DW = 0.4441


Nameβtp
dP2.2870.21863.54 x 10-11
B0.036040.054360.5127

Ouch. Almost no variance is accounted for, and neither the regression as a whole nor the coefficient of the budget term is significant. What's more, DW is very low, showing positive serial correlation in the residuals. So far it looks like there's no relation.

How do we deal with it when we have autocorrelated residuals like this? One technique, invented in 1949, is now called "Cochrane-Orcutt iteration" (COI) after its inventors. It involves repeatedly transforming the variables and running the regression again until your result stabilizes. Caveat: your residuals have to meet certain criteria, especially being "white noise," for this to work. We'll get into that later.

To do COI, you measure the "autocorrelation coefficient," ρ, of the residuals. This is just the regular correlation coefficient between et, the error for a given year, and et-1, the error the previous year. Then you run a new regression:

(yt - ρ yt-1) = β1 (1 - ρ) + β2 (Xt - ρ Xt-1) + εt

You can probably see that the Y and X terms in parentheses are "transformed variables," and that the whole first term on the right is a constant. Again, t is the value for a given year, and t-1 is the year before that.

Lather, rinse, repeat, until the answers get very similar. I did this for the regression above. The process converged after six runs, and I got this:


dP = f(B) (Cochrane-Orcutt)N = 29
R2 = 0.63382 = 0.6203
F1,27 = 4.299 (p < 0.04779)SEE = 0.5050
DW = 1.308


Nameβtp
dP2.3284.1930.0003
B0.12072.0740.0478

Note that we lose one point when doing COI, since we need values the previous year to find our transformed variables.

But what a difference! The deficit now accounts for three-fifths of the variance of inflation. All the coefficients are significant, and the coefficient on the federal balance is significant at 95% confidence.

Have we proved our hypothesis--that deficits cause inflation? Look again. Our independent variable is the federal balance, expenditures minus revenues. The coefficient is positive. What we have seems to imply that surpluses add to inflation, while deficits bring inflation down.

Is this true? Or have we done something wrong? Well, the presence of that very low DW in the first place was a clue. That often implies that one or more explanatory variables are missing from the model. If we ran this again, adding other variables that might be more directly related to inflation (money supply growth, perhaps, or real economic growth), the balance variable might recede to insignificance. We would need to do more research.

And suppose causality runs the other way? Maybe inflation raises revenue, which results in surpluses. But for the moment, at least, it looks like there is some kind of link between the federal budget and inflation.





Page created:04/12/2017
Last modified:  04/13/2017
Author:BPL