## 5.5 Dealing with non-stationarity

The anchovy data have failed both tests for the stationarity, the Augmented Dickey-Fuller and the KPSS test. How do we fix this? The approach in the Box-Jenkins method is to use differencing.

Let’s see how this works with random walk data. A random walk is non-stationary but the difference is white noise so is stationary:

$x_t - x_{t-1} = e_t, e_t \sim N(0,\sigma)$

adf.test(diff(rw))

Augmented Dickey-Fuller Test

data:  diff(rw)
Dickey-Fuller = -3.8711, Lag order = 4, p-value = 0.01834
alternative hypothesis: stationary
kpss.test(diff(rw))
Warning in kpss.test(diff(rw)): p-value greater than printed p-value

KPSS Test for Level Stationarity

data:  diff(rw)
KPSS Level = 0.30489, Truncation lag parameter = 3, p-value = 0.1

If we difference random walk data, the null is rejected for the ADF test and not rejected for the KPSS test. This is what we want.

Let’s try a single difference with the anchovy data. A single difference means dat(t)-dat(t-1). We get this using diff(anchovyts).

diff1dat <- diff(anchovyts)
adf.test(diff1dat)

Augmented Dickey-Fuller Test

data:  diff1dat
Dickey-Fuller = -3.2718, Lag order = 2, p-value = 0.09558
alternative hypothesis: stationary
kpss.test(diff1dat)
Warning in kpss.test(diff1dat): p-value greater than printed p-value

KPSS Test for Level Stationarity

data:  diff1dat
KPSS Level = 0.089671, Truncation lag parameter = 2, p-value = 0.1

If a first difference were not enough, we would try a second difference which is the difference of a first difference.

diff2dat <- diff(diff1dat)
adf.test(diff2dat)
Warning in adf.test(diff2dat): p-value smaller than printed p-value

Augmented Dickey-Fuller Test

data:  diff2dat
Dickey-Fuller = -4.8234, Lag order = 2, p-value = 0.01
alternative hypothesis: stationary

The null hypothesis of a random walk is now rejected so you might think that a 2nd difference is needed for the anchovy data. However the actual problem is that the default for adf.test() includes a trend but we removed the trend with our first difference. Thus we included an unneeded trend parameter in our test. Our data are not that long and this affects the result.

Let’s repeat without the trend and we’ll see that the null hypothesis is rejected. The number of lags is set to be what would be used by adf.test(). See ?adf.test.

k <- trunc((length(diff1dat) - 1)^(1/3))
test <- urca::ur.df(diff1dat, type = "drift", lags = k)
summary(test)

###############################################
# Augmented Dickey-Fuller Test Unit Root Test #
###############################################

Test regression drift

Call:
lm(formula = z.diff ~ z.lag.1 + 1 + z.diff.lag)

Residuals:
Min       1Q   Median       3Q      Max
-0.37551 -0.13887  0.04753  0.13277  0.28223

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.11062    0.06165   1.794  0.08959 .
z.lag.1     -2.16711    0.64900  -3.339  0.00365 **
z.diff.lag1  0.58837    0.47474   1.239  0.23113
z.diff.lag2  0.13273    0.25299   0.525  0.60623
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.207 on 18 degrees of freedom
Multiple R-squared:  0.7231,    Adjusted R-squared:  0.677
F-statistic: 15.67 on 3 and 18 DF,  p-value: 2.918e-05

Value of test-statistic is: -3.3391 5.848

Critical values for test statistics:
1pct  5pct 10pct
tau2 -3.75 -3.00 -2.63
phi1  7.88  5.18  4.12

### 5.5.1ndiffs()

As an alternative to trying many different differences and remembering to include or not include the trend or level, you can use the ndiffs() function in the forecast package. This automates finding the number of differences needed.

forecast::ndiffs(anchovyts, test = "kpss")
[1] 1
forecast::ndiffs(anchovyts, test = "adf")
[1] 1

One difference is required to pass both the ADF and KPSS stationarity tests.