The application of forecast ensembles to probabilistic weather prediction has
spurred considerable interest in their evaluation. Such ensembles are commonly
interpreted as Monte Carlo ensembles meaning that the ensemble members are
perceived as random draws from a distribution. Under this interpretation, a
reasonable property to ask for is statistical consistency, which demands that
the ensemble members and the verification behave like draws from the same
distribution. A widely used technique to assess statistical consistency of a
historical dataset is the rank histogram, which uses as a criterion the number
of times that the verification falls between pairs of members of the ordered
ensemble. Ensemble evaluation is rendered more specific by stratification,
which means that ensembles that satisfy a certain condition (e.g., a certain
meteorological regime) are evaluated separately. Fundamental relationships
between Monte Carlo ensembles, their rank histograms, and random sampling from
the probability simplex according to the Dirichlet distribution are pointed
out. Furthermore, the possible benefits and complications of ensemble
stratification are discussed. The main conclusion is that a stratified Monte
Carlo ensemble might appear inconsistent with the verification even though the
original (unstratified) ensemble is consistent. The apparent inconsistency is
merely a result of stratification. Stratified rank histograms are thus not
necessarily flat. This result is demonstrated by perfect ensemble simulations
and supplemented by mathematical arguments. Possible methods to avoid or remove
artifacts that stratification induces in the rank histogram are suggested.