Stefan Siegert

A useful proof concerning the CRPS

The continuous ranked probability score (CRPS) is a proper scoring rule that measures how well the cumulative distribution function (cdf) \(F(x)\) predicted the event that materialised in the measured outcome \(y\). It is defined as the integrated squared difference between the forecast distribution \(F(x)\) and the hypothetical "perfect" forecast distribution for the outcome \(y\) which would be a Heaviside step function centered on \(y\): \[\begin{aligned} CRPS & = \int dt [F(t) - H(t - y)]^2. \end{aligned}\]

An analytical result that is sometimes found in the literature is \[\begin{aligned} CRPS & = E|X-y| - \frac12 E|X-X'| \end{aligned}\] where the expectation is taken over the independent random variables \(X\) and \(X'\) with distribution \(F(x)\). The proof can be found in Baringhaus and Franz (2004), but it might be older than that.

The crucial result to realize is that the absolute difference \(|x-y|\) can be written as an integral over indicator functions as follows \[\begin{aligned} |x-y| & = \int dt [ I(x \le t \lt y) + I(y \le t \lt x) ] \\ & = \int dt [ I(x \le t) I(y \gt t) + I(y \le t) I(x \gt t) ]. \end{aligned}\]

Suppose \(X\) and \(Y\) are independent random variables with pdfs \(f(x)\) and \(g(y)\), and distribution functions \(F(x)\) and \(G(y)\). Then \[\begin{aligned} E|X-Y| & = \int dx \int dy \int dt [ I(x \le t) I(y \gt t) + I(y \le t) I(x \gt t) ] f(x) f(y)\\ & = \int dt \Big\{ \Big[ \int dx I(x \le t) f(x)\Big] \Big[ \int dy I(y \gt t) g(y) \Big] \\ & \quad \quad \quad + \Big[ \int dy I(y \le t) f(y)\Big] \Big[ \int dx I(x \gt t) f(x) \Big] \Big\}\\ & = \int dt \Big[ F(t)(1-G(t)) + G(t)(1-F(t)) \Big]. \end{aligned}\]

When \(X\) and \(X'\) are identically and independently distributed random variables with cdf \(F(x)\), it follows that \[ E|X-X'| = 2 \int dt [ F(t)(1-F(t)) ]. \]

Therefore \[\begin{aligned} & E|X-Y| - \frac12 E|X-X'| - \frac12 E|Y-Y'| \\ & = \int dt \Big[ F(t)(1-G(t)) + G(t) (1 - F(t)) \Big]\\ &\quad - \int dt \Big[ F(t)(1-F(t))\Big] - \int dt \Big[ G(t)(1-G(t))\Big]\\ & = \int dt [ F(t) - G(t) ]^2. \end{aligned}\] If \(Y\) is a constant, we have \(F(y) = H(t - y)\) and \(E|Y - Y'| = 0\) and so \[\begin{aligned} CRPS & = \int dt [F(t) - H(t-y)]^2\\ & = E|X-y| - \frac12 E|X-X'|. \end{aligned}\]