Stefan Siegert

Time series smoothing with the 2nd order random walk

The RW2 and its precision matrix

The 2nd order random walk (RW2) is a time series model defined by xt=2xt1xt2+ϵt where ϵtN(0,τx1) is iid Normal with precision τx.

The RW2 is a popular prior for temporally correlated data used in time series modelling and Bayesian hierarchical modelling.

Realisations of the RW2 can be generated by realising that the second-order differences (xtxt1)(xt1xt2) are iid Normally distributed. The inverse operation of differencing is the cumulative sum, so one way to generate realisations of RW2 is to cum-summing iid Normal random variables twice.

set.seed(1)
rw2 = sapply(1:5, function(ii) cumsum(cumsum(rnorm(100))))
matplot(rw2, type='l', lty=1)
plot of chunk unnamed-chunk-1

If the vector x=(x1,x2,,xn) is a realisation of RW2, we can show that x has a multivariate Normal distribution. The likelihood of x3,,xn, is given by logp(x3,,xn|x1,x2)=t=3nlogp(xt|xt1,xt2,τx)=constτx2t=3n(xt2xt1+xt2)2 The sum of squares can be written as the dot product of a vector z with itself, where here z is given by z=Dx=(121)x Here z has length n2 and D has n2 rows and n columns. Plugging this into the likelihood, we have logp(x3,,xn|x1,x2)=t=3nlogp(xt|xt1,xt2,τx)=const12x(τxDD)x=const12xQxx Assuming that x1 and x2 have a joint uniform prior distribution, i.e. p(x1,x2|τx)=const, the full joint distribution of the vector x is that of a multivariate Normal distribution p(x|τx)=p(x1,x2|τx)t=3np(xt|xt1,xt2,τx)=const×e12xQxx with precision (inverse variance) matrix given by Qx=τxDD=τx(121254114641146411452121)

The RW2 as a prior in Bayesian hierarchical modelling

Now assume we have time-series data y=(y1,,yn) that we want to model as the sum of a smooth latent vector x and independent Normally distributed noise e with variance τe1. We want to model the latent component x by an RW2. That is we have y=x+e where eN(0,Qe1), where Qe=τe1n and 1n denotes the identity matrix, and xN(0,Qx1) with the RW2 precision matrix Qx as defined above.

Let's first assume that τe and τx are known. How do we infer the latent vector x from the observed data y?

The joint log-density of x and y is given by logp(x,y)=logp(y|x)+logp(x)=const12(yx)Qe(yx)12xQxx We want to infer x from y, that is we want to calculate p(x|y): logp(x|y)=const+logp(x,y)=const12x(Qx+Qe)x+(Qey)x This is the density of the multivariate Normal distribution in canonical form, and so the conditional distribution of x given y is multivariate Normal with conditional expectation E(x|y)=(Qx+Qe)1Qey and conditional variance Var(x|y)=(Qx+Qe)1.

Implementation in R

library(tidyverse)
library(Matrix)
# simulate latent x, errors, and observations
set.seed(18)
n = 50
tau_x = 10
tau_e = 0.1

xy = tibble(t=1:n,
            x=cumsum(cumsum(rnorm(n, 0, 1/sqrt(tau_x)))),
            y=x + rnorm(n, 0, 1/sqrt(tau_e)))

ggplot(xy, aes(x=t)) +
  geom_line(aes(y=x, colour='latent x')) +
  geom_line(aes(y=y, colour='observed data y'))
plot of chunk unnamed-chunk-4
# define precision matrices Q_e and Q_x 
Qe = tau_e * Diagonal(n)
Dx = bandSparse(n=n-2, m=n, k=0:2,
                diagonal=list(rep(1, n-2), rep(-2, n-2), rep(1, n-2)))
Qx = tau_x * crossprod(Dx)
y = xy$y

# calculate conditional expectation and marginal variances
mu_x = drop(solve(Qx + Qe, Qe %*% y))
sigma_x = sqrt(diag(solve(Qx + Qe)))

xy = xy %>% mutate(mu_x = mu_x, sigma_x = sigma_x)

ggplot(xy, aes(x=t)) +
  geom_ribbon(aes(ymin=mu_x - 2 * sigma_x,
                  ymax = mu_x + 2 * sigma_x),
        fill='lightblue') +
  geom_line(aes(y=x, colour='true latent x')) +
  geom_line(aes(y=y, colour='observed data y')) +
  geom_line(aes(y=mu_x, colour='inferred latent x'))
plot of chunk unnamed-chunk-5

Inferring the hyperparameters

So far we have assumed that the hyperparameters τx and τe are known. In practice it's rare that we know the variances of the latent process or the measurement error, so we might want to estimate them.

The following approach calculates the posterior of the hyperparameters θ=(τx,τe) given the observed data y. It uses the two alternative factorisations of the joint density distribution p(y,x,θ)=p(y|x,θ)p(x|θ)p(θ)=p(x|y,θ)p(y|θ)p(θ) and solves for p(θ|y) treating all functions that don't depend on θ as a multiplicative constant p(θ|y)=p(x,y,θ)p(x|y,θ)p(y)=const×p(y|x,θ)p(x|θ)p(θ)p(x|y,θ)

One source of confusion is the occurrence of x on the rhs but not on the lhs, so we might ask, what configuration of x do we use on the rhs? The answer is, any configuration for which p(x|y,θ) is non-zero. The conditional expectation x=E(x|y,θ)=(Qx+Qe)1Qey is a natural choice.