Historical time series of surface temperature and ocean heat content changes
are commonly used metrics to diagnose climate change and estimate properties
of the climate system. We show that recent trends, namely the slowing of
surface temperature rise at the beginning of the 21st century and the
acceleration of heat stored in the deep ocean, have a substantial impact on
these estimates. Using the Massachusetts Institute of Technology Earth System
Model (MESM), we vary three model parameters that influence the behavior of
the climate system: effective climate sensitivity (ECS), the effective ocean
diffusivity of heat anomalies by all mixing processes (

Scientists, policy makers, and the general public are concerned with how
surface temperature will change in the coming decades and further into the
future. These changes depend on many aspects of the climate system. Among
them are climate sensitivity and the rate at which heat is mixed into the
deep ocean. Equilibrium climate sensitivity (ECS) represents the global mean
surface temperature change that would be realized due to a doubling of

The value of climate sensitivity is uncertain but the processes and feedbacks
which set it must be accurately modeled to reliably predict the future. To
this end, a number of studies have used Earth System Models of Intermediate
Complexity (EMICs) to estimate probability distribution functions (PDFs) for
the values of these climate system properties, in particular ECS, ocean
diffusivity, and an estimate of the anthropogenic aerosol forcing

Time series of surface temperature and ocean heat content are commonly used
temperature diagnostics in the evaluation of model performance because they
rule out different combinations of the parameters for being inconsistent with
the observed climate record

In this study, we first seek to improve the methods used in previous work

Using the updated methodology and the 1800 MESM runs,
we answer the following questions: (1) how does the inclusion of more recent
data change the PDFs of model parameters? And (2) what do we learn by
including spatial information in the surface diagnostic? The inclusion of
recent temperature trends can have a significant impact on the estimates of
climate system properties

Second, we show how including spatial variability in the surface temperature
diagnostic can influence the parameter distributions. In almost all parameter
estimation studies, global mean ocean heat content is used as one metric to
evaluate model performance and is paired with a surface temperature
diagnostic to further test the model runs. Typically, groups use time series
of either global mean surface temperature

In Sect.

As outlined in Sect.

Following a standard methodology

We evaluate model performance by comparing each model run to two temperature
diagnostics. The first diagnostic is the time series of decadal mean surface
temperature anomalies in four equal-area zonal bands spanning 0–30 and
30–90

For surface observations, we use datasets from four different research
centers. The datasets we use include the median of the 100-member HadCRUT4
ensemble from the Hadley Centre Climatic Research Unit

We derive the surface temperature diagnostic by temporally and spatially
averaging the gridded data. In the following calculation, we assume
uncertainty in the observations is zero, relying on using multiple datasets
to account for uncertainty in the observed record. Due to data scarcity and
missing values in some regions, we set threshold criteria for each spatial
and temporal average in the derivation. First, the annual mean for each

Once the data mask and decadal mean time series are calculated, each time
series is zonally averaged on the 5

For ocean heat content observations, we use the estimated global mean ocean
heat content in the 0–2000 m layer from

For a given diagnostic period, we calculate the linear trend in the global
mean ocean heat content as the slope of the best-fit linear regression line.
In the calculation of the regression line, all deviations from the mean are
assigned a weight inversely proportional to the square of the standard error
from the

Each model run is compared to the model diagnostics and evaluated through the
use of a goodness-of-fit statistic,

From the

Because of the pre-whitening by the noise-covariance matrix,

Prior to calculating the likelihood function, we interpolate the
goodness-of-fit statistics onto a finer grid in the parameter space. This
interpolation fills in the gaps between

In this study, we implement an alternate interpolation method based off of
radial basis functions

For our implementation, we use the 1800

Weight assigned to each node point as a function of radial distance in normalized parameter space. The decay is isometric in the parameter space and the same for all node points.

The weighting function is applied to each node point within the parameter
space. One can imagine a sphere surrounding each of these points, with the
weight assigned to that point decaying as a function of the distance from the
center. All points within the parameter space are in regions where the
spheres from multiple node points overlap. The interpolated value at any
point is the weighted sum of the node values associated with the overlapping
spheres. Thus, we calculate the

In summary, we have made a number of changes and updates to the methodology. (i) To account for a change in observational dataset, we have modified the ocean diagnostic to be estimated from the 0–2000 m layer, as opposed to the 0–3000 m layer. (ii) We now estimate the natural variability from a common model, as opposed to using different models for the surface and ocean diagnostics. (iii) We implement a new interpolation scheme where radial basis functions are used to interpolate goodness-of-fit statistics from the coarse grid of model runs to the fine grid used to derive the joint probability distribution functions.

Using the updated methodology, we show how temporal and spatial information impacts the PDFs of the model parameters. We address the temporal component by adding more recent data to the model diagnostics in one of two ways. First, we extend the diagnostics by fixing the starting date while shifting the end date forward in time. To maximize the amount of data that we use in the surface diagnostic while also ensuring good observational data coverage, we take decadal mean temperature anomalies with respect to the 1906–1995 base period starting in 1941. We then shift the end date from 1990 to 2000 to 2010 to change the diagnostics from 5 to 6 to 7 decades, respectively. For the ocean diagnostic, we choose 1955 as the starting date of the first pentad to correspond to the beginning of the observational dataset. Similar to the surface diagnostic, we increase the length of the ocean diagnostic by changing the end date of the last pentad from 1990 to 2000 to 2010.

In a second test, we fix the length of the diagnostics while shifting the end date forward in time. This maintains a 5-decade diagnostic for the surface diagnostic by shifting the 50-year window from 1941–1990 to 1951–2000 to 1961–2010 and a 35-year ocean diagnostic by shifting the period we use to estimate the linear trend from 1955–1990 to 1965–2000 to 1975–2010. By deriving PDFs with each pair of diagnostics corresponding to a given end date, we determine the impact of recent temperature trends on the parameter distributions in both the extension and sliding window cases.

In a third test, we derive PDFs with different structures for the surface diagnostic. In these new diagnostics, we maintain the decadal mean temporal structure but reduce the dimensionality of the spatial structure by replacing the four zonal bands with global mean or hemispheric mean temperatures. In the former case, we have a one-dimensional spatial structure, and in the latter a two-dimensional structure.

We present our findings as follows. In Sect.

We first identify the difference in the ocean diagnostic derived from the
0–3000 and 0–2000 m layers for the common period of 1955–1996
(Fig.

Global mean ocean heat content for the 0–3000 m layer

Second, we demonstrate the impact of switching to the RBF algorithm. For one
of our surface temperature diagnostics, we interpolate the

Example of the differences between the algorithms to
interpolate goodness-of-fit statistics from the coarse grid of model
runs to the finer grid used for the derivation of parameter
distributions. Calculated

We aim to improve upon the shortcomings of the old interpolation method by
identifying

We also observe a reduction in the range of

Thus far, we have only investigated the impact of

As in Fig.

To further test our choice of

Comparison of

With a few exceptions, we see good agreement between

To test the impact of the methodological changes, we start from a previously
published probability distribution and apply the changes one at a time. For a
reference point, we start with the PDF from

When changing the ocean diagnostic from the 0–3000 m layer to the
0–2000 m layer, we observe the largest change as a shift towards higher

For the second change, we explore the implementation of the RBF interpolation
algorithm. In Fig.

Marginal probability distribution functions for

In general, we observe tighter constraints on all of the distributions when a
common control run model is used for the surface and ocean diagnostics. For
all three parameters, the width of the 90 % credible interval decreases.
One potential reason for these tighter constraints is an undersampling of the
internal variability resulting from using only CCSM4's variability and not
across multiple models. Due to structural differences, the internal
variability is not the same across all models and a single model does not
span the full range of variability. We investigate the sensitivity of the
distributions to the internal variability estimate in a separate study

Despite the tighter constraints, we observe multiple minima and maxima in the
climate sensitivity distribution. All of the local extrema occur at values of
ECS where the model has been run. We attribute these oscillations to the
spline interpolation method attempting to pass through

We summarize the net impact of the changes by implementing all three
simultaneously (red curve in Fig.

Before presenting new PDFs using the methods discussed in the previous
section, we present the model diagnostics used to derive them. We show the
time series of decadal mean temperature anomalies with respect to the
1906–1995 climatology in the four equal-area zonal bands of the surface
temperature diagnostic (Fig.

Decadal mean temperature anomaly time series derived from
the HadCRUT4, NOAA MLOST, BEST, and GISTEMP 250 datasets. Time series
are for the four equal-area zonal bands spanning

From the time series, we see that while general similarities exist, the model
diagnostic depends on which surface observations are used. Across all
datasets, we observe the largest signal in the 30–90

We illustrate how additional data impact the estimate of the linear increase
in ocean heat content (Figs.

Global mean ocean heat content for the 0–2000 m layer. Shading indicates twice the standard error on either side of the estimate. Also shown are the best fit linear trend lines for the trend beginning in 1955 and ending in 1990 (black), 2000 (red), and 2010 (blue). Dashed lines indicate the 95 % confidence interval for the point estimate for a given year based on the best fit line and its uncertainty.

As in Fig.

The recent acceleration of heat stored in the deep ocean is well documented

For each surface and ocean diagnostic set, we derive joint probability
distributions according the experiments discussed in
Sect.

We first investigate the PDFs by looking for correlations between the model
parameters. For each pair of model parameters and for each configuration of
the model diagnostics, we calculate the two-dimensional marginal distribution
by integrating over the third parameter (Fig.

Two-dimensional joint probability distribution functions for each
pair of parameters:

Second, we show that incorporating more recent data into the temperature
diagnostics has a significant impact on the individual parameter estimates by
investigating the marginal PDF of each parameter
(Fig.

Marginal probability distribution functions for

For climate sensitivity, we find that extending the data beyond 1990 leads to
higher climate sensitivity estimates when compared to the estimate shown in
Fig.

Our estimates of

We also see shifts in the

Although not shown, we observe these shifts in the

We attribute the shift towards stronger cooling for the 1991–2000 decade to
the cut-off of the high

Finally, we derive estimates of transient climate response from the PDFs
discussed above (Fig.

Until now, we have only considered how the temporal component of the
diagnostics impacts the parameter estimates. As a final case study, we reduce
the spatial dimension of the surface temperature diagnostic by replacing the
four zonal band diagnostic with either global mean surface temperature or
hemispheric mean temperatures using the 1941–2010 diagnostic period
(Fig.

Marginal probability distribution
functions for

We find little sensitivity in the central estimates of the ECS and

Unlike with the ECS and

We implement a number of methodological changes to improve probability estimates of climate system properties. Changes include switching to an interpolation based on radial basis functions, estimating natural variability from a common model across diagnostics, using new observational datasets, and incorporating recent temperature changes in model diagnostics. We show that the parameter estimates follow signals in the data and depend on the model diagnostics. Furthermore, we show that the technical changes, namely the interpolation method and the natural variability estimate, do not considerably change the central estimate of the parameters, but do impact the uncertainty estimates of the distributions.

We have shown that the RBF interpolation method is successful in smoothing
the distributions while not changing the central estimate. The success of the
RBF method is an encouraging sign for future research. Due to the
two-dimensional interpolation method previously used, our work until now has
been restricted to running ensembles on a uniform grid of points in the
parameter space. The RBF method is three-dimensional and can be applied to
any collection of node points. We can thus run the full model at any set of
non-gridded nodes and interpolate the goodness-of-fit statistics to estimate
the values at intermediate points. Other studies

Our results suggest that the spatial structure of model diagnostics plays a
key role in the estimation of parameters with spatial variation. When adding
spatial structure to the diagnostics, we observed little change in parameters
representing global mean quantities (ECS and

Overall, our work highlights that recent temperature trends have a strong
influence on the parameter distributions. In particular, we observe a shift
in the distributions towards higher climate sensitivity due to the addition
of recent surface temperature warming trends relative to 1990, but with a
reduction in the estimate when using data up to 2010 as opposed to 2000. We
also observe that the distributions of

The source code of MESM will become publicly available for non-commercial research and educational purposes as soon as a software license that is being prepared by the MIT Technology Licensing Office is complete. For further information contact mesm-request@mit.edu. All data required to reproduce the figures in the main text and scripts to replicate the figures are available. Model output is available upon request.

As discussed in Sect.

The weight of any node point in the calculation of

To test other

AL and AS carried out the MESM simulations. AS wrote the codes for extracting model output. AL performed the analysis and prepared the original manuscript. AL and CF developed the model ensemble and experimental design. AL, CF, AS, and EM all contributed to interpreting the analysis and synthesizing the findings.

The authors declare that they have no conflict of interest.

This work was supported by the U.S. Department of Energy (DOE), Office of
Science, under award DE-FG02-94ER61937, and other government, industry, and
foundation sponsors of the MIT Joint Program on the Science and Policy of
Global Change. For a complete list of sponsors and U.S. government funding
sources, see