5.5 Dealing with non-stationarity
The anchovy data have failed both tests for the stationarity, the Augmented Dickey-Fuller and the KPSS test. How do we fix this? The approach in the Box-Jenkins method is to use differencing.
Let’s see how this works with random walk data. A random walk is non-stationary but the difference is white noise so is stationary:
\[x_t - x_{t-1} = e_t, e_t \sim N(0,\sigma)\]
adf.test(diff(rw))
Augmented Dickey-Fuller Test
data: diff(rw)
Dickey-Fuller = -3.8711, Lag order = 4, p-value = 0.01834
alternative hypothesis: stationary
kpss.test(diff(rw))
Warning in kpss.test(diff(rw)): p-value greater than printed p-value
KPSS Test for Level Stationarity
data: diff(rw)
KPSS Level = 0.30489, Truncation lag parameter = 3, p-value = 0.1
If we difference random walk data, the null is rejected for the ADF test and not rejected for the KPSS test. This is what we want.
Let’s try a single difference with the anchovy data. A single difference means dat(t)-dat(t-1)
. We get this using diff(anchovyts)
.
<- diff(anchovyts)
diff1dat adf.test(diff1dat)
Augmented Dickey-Fuller Test
data: diff1dat
Dickey-Fuller = -3.2718, Lag order = 2, p-value = 0.09558
alternative hypothesis: stationary
kpss.test(diff1dat)
Warning in kpss.test(diff1dat): p-value greater than printed p-value
KPSS Test for Level Stationarity
data: diff1dat
KPSS Level = 0.089671, Truncation lag parameter = 2, p-value = 0.1
If a first difference were not enough, we would try a second difference which is the difference of a first difference.
<- diff(diff1dat)
diff2dat adf.test(diff2dat)
Warning in adf.test(diff2dat): p-value smaller than printed p-value
Augmented Dickey-Fuller Test
data: diff2dat
Dickey-Fuller = -4.8234, Lag order = 2, p-value = 0.01
alternative hypothesis: stationary
The null hypothesis of a random walk is now rejected so you might think that a 2nd difference is needed for the anchovy data. However the actual problem is that the default for adf.test()
includes a trend but we removed the trend with our first difference. Thus we included an unneeded trend parameter in our test. Our data are not that long and this affects the result.
Let’s repeat without the trend and we’ll see that the null hypothesis is rejected. The number of lags is set to be what would be used by adf.test()
. See ?adf.test
.
<- trunc((length(diff1dat) - 1)^(1/3))
k <- urca::ur.df(diff1dat, type = "drift", lags = k)
test summary(test)
###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################
Test regression drift
Call:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)
Residuals:
Min 1Q Median 3Q Max
-0.37551 -0.13887 0.04753 0.13277 0.28223
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.11062 0.06165 1.794 0.08959 .
z.lag.1 -2.16711 0.64900 -3.339 0.00365 **
z.diff.lag1 0.58837 0.47474 1.239 0.23113
z.diff.lag2 0.13273 0.25299 0.525 0.60623
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.207 on 18 degrees of freedom
Multiple R-squared: 0.7231, Adjusted R-squared: 0.677
F-statistic: 15.67 on 3 and 18 DF, p-value: 2.918e-05
Value of test-statistic is: -3.3391 5.848
Critical values for test statistics:
1pct 5pct 10pct
tau2 -3.75 -3.00 -2.63
phi1 7.88 5.18 4.12
5.5.1 ndiffs()
As an alternative to trying many different differences and remembering to include or not include the trend or level, you can use the ndiffs()
function in the forecast package. This automates finding the number of differences needed.
::ndiffs(anchovyts, test = "kpss") forecast
[1] 1
::ndiffs(anchovyts, test = "adf") forecast
[1] 1
One difference is required to pass both the ADF and KPSS stationarity tests.