Time series smoothing with the 2nd order random walk
The RW2 and its precision matrix
The 2nd order random walk (RW2) is a time series model defined by
where is iid Normal with precision .
The RW2 is a popular prior for temporally correlated data used in time series modelling and Bayesian hierarchical modelling.
Realisations of the RW2 can be generated by realising that the second-order differences are iid Normally distributed. The inverse operation of differencing is the cumulative sum, so one way to generate realisations of RW2 is to cum-summing iid Normal random variables twice.
set.seed(1)
rw2 = sapply(1:5, function(ii) cumsum(cumsum(rnorm(100))))
matplot(rw2, type='l', lty=1)
If the vector is a realisation of RW2, we can show that has a multivariate Normal distribution. The likelihood of , is given by
The sum of squares can be written as the dot product of a vector with itself, where here is given by
Here has length and has rows and columns.
Plugging this into the likelihood, we have
Assuming that and have a joint uniform prior distribution, i.e. , the full joint distribution of the vector is that of a multivariate Normal distribution
with precision (inverse variance) matrix given by
The RW2 as a prior in Bayesian hierarchical modelling
Now assume we have time-series data that we want to model as the sum of a smooth latent vector and independent Normally distributed noise with variance . We want to model the latent component by an RW2. That is we have
where , where and denotes the identity matrix, and with the RW2 precision matrix as defined above.
Let's first assume that and are known. How do we infer the latent vector from the observed data ?
The joint log-density of and is given by
We want to infer from , that is we want to calculate :
This is the density of the multivariate Normal distribution in canonical form, and so the conditional distribution of given is multivariate Normal with conditional expectation
and conditional variance
Implementation in R
library(tidyverse)
library(Matrix)
# simulate latent x, errors, and observations
set.seed(18)
n = 50
tau_x = 10
tau_e = 0.1
xy = tibble(t=1:n,
x=cumsum(cumsum(rnorm(n, 0, 1/sqrt(tau_x)))),
y=x + rnorm(n, 0, 1/sqrt(tau_e)))
ggplot(xy, aes(x=t)) +
geom_line(aes(y=x, colour='latent x')) +
geom_line(aes(y=y, colour='observed data y'))
# define precision matrices Q_e and Q_x
Qe = tau_e * Diagonal(n)
Dx = bandSparse(n=n-2, m=n, k=0:2,
diagonal=list(rep(1, n-2), rep(-2, n-2), rep(1, n-2)))
Qx = tau_x * crossprod(Dx)
y = xy$y
# calculate conditional expectation and marginal variances
mu_x = drop(solve(Qx + Qe, Qe %*% y))
sigma_x = sqrt(diag(solve(Qx + Qe)))
xy = xy %>% mutate(mu_x = mu_x, sigma_x = sigma_x)
ggplot(xy, aes(x=t)) +
geom_ribbon(aes(ymin=mu_x - 2 * sigma_x,
ymax = mu_x + 2 * sigma_x),
fill='lightblue') +
geom_line(aes(y=x, colour='true latent x')) +
geom_line(aes(y=y, colour='observed data y')) +
geom_line(aes(y=mu_x, colour='inferred latent x'))
Inferring the hyperparameters
So far we have assumed that the hyperparameters and are known.
In practice it's rare that we know the variances of the latent process or the measurement error, so we might want to estimate them.
The following approach calculates the posterior of the hyperparameters given the observed data .
It uses the two alternative factorisations of the joint density distribution
and solves for treating all functions that don't depend on as a multiplicative constant
One source of confusion is the occurrence of on the rhs but not on the lhs, so we might ask, what configuration of do we use on the rhs? The answer is, any configuration for which is non-zero. The conditional expectation is a natural choice.