Ice cores provide insight into the past climate over many millennia. Due to ice compaction, the raw data for any single core are irregular in time. Multiple cores have different irregularities; and when considered together, they are misaligned in time. After processing, such data are made available to researchers as regular time series: a data product. Typically, these cores are independently processed. This paper considers a fast Bayesian method for the joint processing of multiple irregular series. This is shown to be more efficient than the independent alternative. Furthermore, our explicit framework permits a reliable modelling of the impact of the multiple sources of uncertainty. The methodology is illustrated with the analysis of a pair of ice cores. Our data products, in the form of posterior marginals or joint distributions on an arbitrary temporal grid, are finite Gaussian mixtures. We can also produce process histories to study non-linear functionals of interest. More generally, the concept of joint analysis via hierarchical Gaussian process models can be widely extended, as the models used can be viewed within the larger context of continuous space–time processes.

Ice cores play an important role in revealing Earth's climate history via the
analysis of their chemical composition. Data from ice cores are available in
two forms: “raw” (typically irregular in time) and “data products”
(typically regularly spaced in time, i.e. “gridded”). Often the latter are
used as input for climate models to analyse past climate change.
Alternatively, being gridded, researchers can simply combine one series with
other data series, similarly gridded, representing other aspects of climate.
Such data products are pre-processed from raw data using a variety of
techniques: from simple running averages

This paper proposes a joint statistical model for processing multiple raw ice
core data sets. Like others in the wider field of palaeoclimate
reconstruction

A pair of ice cores drilled in Greenland are used to illustrate our
framework. As pointed out, for instance, by

Whilst our motivating example stems from climate research, multivariate data
with different temporal irregularities are a common feature in many
contemporary applications. For example, the ability to combine outputs at
different levels of accuracy is crucial to the understanding of processes
being studied through potentially expensive computer experimentation. A
useful approach in such applications is to combine results from many cheap
but low-resolution experiments with those from a few expensive but
high-resolution experiments, by linking the data via different layers of
modelling

From a theoretical perspective, we view misaligned time series as a special
case of spatial misalignment in spatial statistics

We introduce and discuss the raw data and their associated uncertainties in
Sect.

Palaeoclimate archives such as tree rings, laminated lake sediments and ice
cores are often used as a guide to past climatic conditions

We obtain data from the National Climatic Data Center
(

Even though sections within a core are regular in length, their corresponding
ages are irregular. Thus, jointly, multiple ice core time series with
different irregularities are temporally misaligned. The

An age difference value of 80.6 yr between roughly 1320 and 1400 cal yr BP has been omitted in this figure to focus on other significant features of this plot.

.Scatter plots of

Since the

The term “nugget” refers
to the apparent discontinuity at the beginning of a semivariogram. It is
attributed to two sources of variation: the noise of data at high temporal
frequency, and that which is due to uncertainty from data collection

Empirical semivariograms of GISP2 and GRIP. They suggest that the
linear semivariogram is a suitable model for both of the ice core data sets,
i.e.

Our final exploratory analysis focuses on the standardised distribution of
the first differences of the

In this section we outline our notation and describe our model for multiple misaligned irregular time series data. We show how to perform fast Bayesian inference on the parameters of this model without resorting to Markov chain Monte Carlo methods. Subsequently, we describe two efficient algorithms for imputation of a latent process of interest onto a time grid. One algorithm is simulation-free computation of posterior marginals and the other involves simple Monte Carlo simulation from posterior joint distributions.

We consider a hierarchical model comprised of the data, process and parameter
layer. At the data layer,

At a process layer, we express

In Sect.

The

To complete the hierarchical modelling structure, prior distributions are
assigned to the model parameters. We use reference priors on

Our objective, given

There are two choices of random variables for imputation. We could focus on

Our later mathematical derivations are substantially simplified by defining

In our new notation we can rewrite some of the equations discussed in
Sect.

Similarly, Eq. (

Let

Initial inference is focused on

Our next goal is to derive the joint posterior distribution of

Derivation of the first quantity in the above integrand is, again, by
completing the quadratic form as in Eq. (

The joint and marginal predictive distributions in Eqs. (

A greater challenge is posed from investigations of extreme events such as
minima and their timing. These are examples of non-linear functionals of

For brevity, we use Table

A more complete version of this approach uses 1000 histories. For generality
we compute conditional quantiles at time grid

Illustration of process histories of length 3 on a time grid of
5 k yr intervals. For each sample, we simulate the parameters from a joint
posterior distribution, followed by a realisation of the latent process

In this section we apply the model framework and inference procedures
presented in Sect.

Using the data from both cores jointly, we obtain the discrete approximation
to the marginal posterior distributions of

Under our model-based approach we have the choice of what summaries of the

Plots of the smoothed posterior distributions of

To gain a better understanding of the benefit of joint modelling over
separate alternatives, we fit an independent increments model with Gaussian
noise to each core separately. Note that, in contrast to the joint approach,
the relationship between two nugget parameters is suppressed in the separate
approach. Thus, each model has two parameters (a process variance and a
nugget parameter). We refer to
Appendix

Plots of quantile-based 50 and 95 % credible intervals of the
marginal posterior distributions of process

The “spikes” (e.g. 0, 1.36, 3.4, and 8.2 k cal BP in GISP2) in
Fig.

Note
that the spikes at times 0 and 11 are the modelling artefact known as the
“boundary effect”; see, for instance

Although this is not a full uncertainty comparison of our method with other methods – for neither standard deviations nor IQR are available – it suggests that these ignore valuable information by treating each core separately.

Plots of the interquartile ranges (IQR; corresponding to the width
of the 50 % credible band in Fig.

In this subsection, we discuss the 8.2 ka event, a sudden reduction in North
Atlantic temperature during a period around 8.2 k cal yr BP. This event
is associated with a transient change in the North Atlantic overturning
circulation. Consequently, the amount of evaporated water in the ocean that
became ice in Greenland is amongst its best sources of evidence. The date
corresponding to the local minimum of the averages is not a satisfactory
estimator of the time of such event. We propose to use our data product, the

Like others in the literature, we define the 8.2 ka event by an attainment
of a minimum in the temperature value during a specific time period

Monte Carlo samples from joint posterior distributions thus provide a
flexible data product in its own right. Indeed, in climate reconstruction,
this is precisely as proposed by

We have presented a hierarchical model to jointly analyse multiple irregular time series. An important component of our model is the Gaussian Markov assumption based on multivariate independent increments that gives us a natural vehicle for joint modelling. We further derived and implemented a fast algorithm for parameter inference and imputation based on this model. We demonstrated that the joint approach utilises information from multiple time series more efficiently than one-series-at-a-time alternatives.

Our paper has been tailored to creating a climate data product from Greenland ice cores. Our proposed framework is simple but useful to combine and mix multiple ice core time series, allowing each series to have a different temporal support. To the best of our knowledge, this work is a first attempt at directly addressing the joint behaviour of multiple ice cores in their raw and misaligned form. Like others in the literature, we are proposing the use of linear combinations of all the data available to perform imputation. Ours is optimal given the covariance structure and robust to uncertainties in that covariance structure by being Bayesian and including posterior uncertainty in the parameters. Our data products, in the form of non-Gaussian posterior predictive distributions, are richer than what have been previously possible. More importantly, our process histories can easily be utilised by other researchers to answer complex questions that are otherwise analytically intractable. This advantage was demonstrated using a case study of an abrupt climate change event.

We believe this to be an initial attempt at joint palaeoclimate inference,
upon which other work can build. Some parameters in our model were formulated
according to the respective lengths of ice core sections. This approach is
likely to be problematic in the study of sections longer than the Holocene. A
more realistic approach is to allow the value of

In this appendix we discuss our treatment for the process variance and nugget
via the

If

If we assume that the nugget effect is at an annual level (denoted as

In this appendix, we discuss the model choice for

To formally measure the benefit of joint modelling, we compare model

We further propose

Values of the Bayesian information criterion (BIC) obtained from different model settings. The bold value highlights the model with the best fit.

Plot of the smoothed marginal posterior distribution of the
parameter

As a final model checking step, we determine whether the parameters in our
model are identifiable or not. We do this by simulating model parameters
(

The results of our simulations are shown in Table

Performance of the model fitting algorithm. All results were based on 1000 simulation runs.

This research was funded by Science Foundation Ireland (grant no. 10/RFP/MTH2779). The authors thank Eric W. Wolff for his helpful comments on the Greenland ice core data set. Edited by: R. Donner Reviewed by: three anonymous referees