The performance of a new statistical framework, developed for
the evaluation of simulated temperature responses to climate forcings against
temperature reconstructions derived from climate proxy data for the last millennium, is evaluated
in a so-called pseudo-proxy experiment, where the true unobservable temperature is replaced
with output data from a selected simulation with a climate model. Being an extension of the statistical
model used in many detection and attribution (D&A) studies,
the framework under study involves two main types of statistical models, each of which is based
on the concept of latent (unobservable) variables:

The evaluation of climate models used to make projections of future climate changes is a crucial
issue within climate research

Other studies may instead focus on comparing the probability distributions of climate model output
to the corresponding empirical distributions of observed data using so-called divergence functions

It should be emphasised that evaluation of climate model simulations is a complex process requiring
the performance of a large number of tests with respect to various climatological aspects.
As pointed out by, for example,

Recently, a new framework for evaluation of climate model simulations against observational data
was developed by

Another climate-relevant property of the LAS22 framework is that it distinguishes between external
climate factors that can be either of natural or anthropogenic origin and the internal climate
processes that are internal to the climate system itself

LAS22 also uses the concept of

Concerning its statistical properties, the LAS22 framework can be viewed as a natural extension of
the statistical model used in so-called detection and attribution (D&A) studies

Using the fact that a general ME model is a special case of CFA and SEM models,
LAS22 has extended the ME model specification to more complicated CFA and SEM models. As a result,
it became possible to overcome some limitations of the ME model, for example, the inability to take
into account the effects of possible interactions between forcings

In addition, LAS22 allows for a flexible specification of latent structure of observable variables, depending on aspects such as (i) the number of forcings used to drive the climate model under consideration, (ii) our knowledge and/or assumptions about their possible effects on the temperature within the region and period of interest, (iii) assumptions concerning co-relations among model variables representing both latent and observable temperatures, and (iv) the availability of simulated data.

At the same time, the LAS22 framework also makes it possible to address the questions posed in D&A studies. Moreover, LAS22 allows the attribution issue to be addressed separately from the question of consistency.

The latter feature is due to another framework, whose ideas were used by LAS22 during the course
of extension of the ME model specification. Developed by

In the present work, we, part of the LAS22 research team, aim to perform a practical evaluation of the statistical models of LAS22 in a numerical experiment. In addition, we also aim to compare their performance to the performance of the ME model used in D&A studies when it is applied to the same data.

A vital feature of our numerical experiment is that specially selected climate model simulations replace
real-world temperature observations. Experiments using climate model simulations
instead of real-world data are often referred to in the (paleo)climatological literature as

Importantly, climate model simulations that played the role of observational data were forced by the same reconstructed forcings as those that are subject to evaluation. This condition justifies the consistency (both in terms of the magnitude and large-scale shape) between the unobservable simulated temperature responses to the forcings of interest, embedded in the simulations, and their counterparts, embedded in pseudo-observations.

Thus, the rejection of the statistical model in question should be interpreted as an unambiguous
indication of the associated underlying latent structure being misspecified and inconsistent with the data.
Contrarily, a statistical model that is not rejected and demonstrates the best fit among all models with an

It is important to add that our pseudo-observations play the role of the true unobservable temperature, uncontaminated by any non-climatic noise. Although it is of great interest to evaluate the sensitivity of the statistical models under consideration to increasing noise levels, no such sensitivity analysis was performed within the confines of the present experiment.

After having investigated which statistical models that demonstrate an acceptable performance with zero proxy noise, it will be easier to design the future sensitivity analysis.

Concerning statistical packages, in the present work we employed the

Finally, let us describe the structure of this paper. Section 2 provides a description of the data,
the results of its initial analysis, and the way of constructing data sets to which we fit our
statistical models. The statistical models from the LAS22 framework are presented in Sect. 3,
while the numerical results of their analyses are given partly in Sect. 4 and partly in the first section
of the Supplement to this article. In total, the Supplement contains three sections.
Its second section is devoted to providing a theoretical overview of the central definitions and
concepts of SEM, which includes CFA as its special case. In the third section of the Supplement, one finds
examples of using the

Data analysed in the present study consist of simulated near-surface temperatures
generated with the Community Earth System Model (CESM) version 1.1 for the period 850–2005
(the CESM-LME (Last Millennium Ensemble)).
A detailed description of the model and the ensemble simulation experiment can be found in

For our analysis, we select seasonal-mean temperature data for the seven regions and the seasons
defined by the

The CESM-LME experiment used

According to

The GHG forcing was implemented so that variations in greenhouse gas concentrations in the climate model's atmosphere are the same everywhere. This, however, does not imply that the simulated temperature response to the forcing is expected to be the same for all above-described regions and seasons.

The climate model did not include an interactive carbon cycle model. This means that variations in the amount of greenhouse gases in the model’s atmosphere could not arise dynamically in response to changes in the model’s climate but only to variations determined by the reconstructed GHG forcing data. Consequently, if a SEM model suggests the existence of any causal path to the variable denoting the simulated temperature response to the GHG forcing, then such a path may be interpreted as an indication that interaction between climate and greenhouse gas concentrations has happened in the real climate system and that this interaction is reflected in the reconstructed GHG forcing history used to drive the climate model.

Natural variations in greenhouse gas concentrations in the atmosphere occur on all timescales and are expected

to have occurred during our entire study period. It is evident that anthropogenic
activity has led to increased greenhouse gas concentrations in about the last 100 years of our study period, mainly due to combustion of fossil fuels.
However, an anthropogenic influence on greenhouse gas concentrations may have started
already several thousand years ago, although this possible influence has been debated. It can anyway not be
excluded that human activity may have led to changes in GHG forcing throughout our entire study period

Prior to analysing the climate model simulations above by means of our statistical models, it is necessary to check whether each of the sequences satisfies the assumptions of these statistical models. To this end, an initial data analysis is performed, whose results are described in the next subsection.

Let

According to LAS22, the mean-centred

In contrast to the random

Thus, the assumptions to check are the assumptions of normality and of mutually independent observations.
Since the forced component of simulated temperatures is treated as a repeatable outcome,
both assumptions concern the

The independence assumption is checked by studying the autocorrelation structure of sequences, defined in Eq. (

According to these figures,

To conclude, all

Further, we investigated whether the decadally resolved residual sequences, defined in
Eq. (

An important premise of the statistical models, suggested by LAS22, is that the variance of
each

If these assumptions are met, it will be possible not only to apply estimator
(

If, on the other hand, these assumptions are violated, estimator (

As explained by LAS22, if the model fits the data adequately both statistically or
heuristically, and the resulting estimate of

However, both estimates of

For our data, the application of model (

Overview of the replicates eliminated from the ensembles that do not satisfy the assumptions
of estimator (

Recall from the Introduction that, in our experiment, observational data are replaced by an appropriate
climate model simulation. More precisely, such a climate model simulation is supposed to replace the true
unobservable temperature, defined initially in SUN12. Combining the notations of LAS22 with the notations
of SUN12, the mean-centred true temperature at time point

Also, the forced and internal variability are regarded as mutually independent processes.

Among the climate model simulations presented earlier, the most suitable
candidates for the role of pseudo

Choosing one replicate at a time enables us to construct the corresponding number of data sets. Fitting the statistical models to each of them makes it possible to investigate the stability of the performance of each statistical model of interest.

This is especially important when respecifications of the models by deleting and/or adding some hypothesised relations are performed. Although respecifications are supposed to be motivated from the climatological viewpoint, they are in essence results of a purely data-driven process. Therefore, it is crucial to apply some form of cross-validation with respect to the set of models considered in a sequence of model evaluations. The availability of additional data sets provides such an opportunity.

However, letting

Overview of the replicates of each

To avoid this situation, the

In order to avoid excessive notations, from now on we will use neither the bar notation for mean sequences, nor the tilde for standardised latent variables. One can easily recognise models with standardised latent variables through the correlation matrices for their latent variables, while models with unstandardised latent variables are associated with variance–covariance matrices.

Another important aspect to point out is that CFA and SEM models presented in this section are adjusted
for the use within a pseudo-proxy experiment. The adjustment is needed because the framework of LAS22
models common latent structures for simulated and observational data in terms of true latent
temperature responses to real-world forcings.
However, within a pseudo-proxy experiment, where observational data are replaced by climate model
simulations, these true latent temperature responses are replaced by their simulated counterparts.
The consequences of this replacement are as follows:

The hypothesis of consistency between simulated and observed climate change is correct.

The structure of the unforced components in the resulting statistical models is simpler compared to that associated with the original statistical models of LAS22.

It should also be realised that the correctness of the hypothesis of consistency is also applied to the ME model used in D&A studies, although the statistical framework of “optimal fingerprinting” models common latent structures for simulated and observational data in terms of simulated temperature responses to reconstructed forcings.

The ME-CFA(6, 5) model is given in Table

Parameters of Model 1, abbr. ME-CFA(6, 5) model, with six indicators and five standardised latent common factors with 1 degree of freedom.

Within the ME-CFA(6, 5) model, all specific factors

To arrive at the ME-CFA(6, 5) model, the ME model was first rewritten in a matrix form as shown in
Table

The ME model from Eq. (

Here, the ME model appears as a factor model with unstandardised latent factors.
Their subsequent standardisation gives the ME-CFA(6, 5) model, whose factor loadings are related to
the parameters of the ME model as follows:

The hypothesis

The hypothesis of consistency

It should also be noted that the ideas of CFA make it possible to test the hypothesis of consistency
without performing multiple simultaneous tests concerning the above-defined ratios. One simply
fits the ME-CFA(6, 5) model to data under the restrictions

To evaluate the performance of the ME model used in D&A studies,
we fit the ME-CFA(6, 5) model as if it were a ME model associated with the TLS estimator. That is,
the parameter estimates will be obtained under the assumption that the whole error variance–covariance
matrix is known a priori, as shown in Table

For checking purposes, we nevertheless use the principles of CFA to their full extent (see
Sect. S2.4 in the Supplement), instead of using the methods developed specifically for ME models

As a final comment, let us emphasise that since the hypothesis of consistency is correct,
all

Model 2 is formulated by extending the ME-CFA(6, 5) model both in terms of the number of indicators
and in terms of common latent factors. The parameters of the resulting model, abbr. the CFA(7, 6) model,
are presented in Table

Parameters of Model 2, abbr. CFA(7, 6), containing seven indicators and six standardised latent factors with 12 degrees of freedom.

As one can see, adding

Further, as follows from the correlation matrix for the common factors, the CFA(7, 6) model hypothesises
the mutual uncorrelatedness between

On the other hand, it is difficult to hypothesise zero correlations between

Another difference between the ME-CFA(6, 5) model and the CFA(7, 6) model is that the ME-CFA(6, 5) model treats

Another aspect, worthy of discussion, is the ability of the CFA model presented to discriminate
between the natural and anthropogenic components of

Finally, it can be seen in Table

Just as in the case of the ME-CFA(6, 5) model, the correctness of the hypothesis of consistency means that an inadequate model fit to the data and/or inadmissible estimates are due to a misspecified latent structure. Respecifications of the structure, requiring the introduction of causal inputs, entails in turn the movement to the SEM specifications, suggested within the LAS22 framework (see the next section). If some modified version of the CFA model presented results in a climatologically defensible solution and fits adequately to the data, then it is a motivation to accept this CFA model as a reasonable approximation of the underlying latent structure.

The SEM model analysed in this experiment is presented graphically in Fig.

Just like the CFA(7, 6) model in Table

Path diagram for the SEM model with five standardised exogenous latent factors,

In contrast to the CFA(7, 6) model, the SEM model reflects the substantive knowledge of atmosphere–climate
interactions, which may arise when natural changes in the levels of GHG in the atmosphere are caused
by other climatic processes of natural origin. In the SEM model, this is reflected through the causal
inputs received by

Concerning the interaction term in Eq. (

Unfortunately, with the available climate model simulations in hand, it is not possible to model the natural
component of

The interpretation issue of

The absence of climate model simulations forced only by the anthropogenic changes in the GHG forcings also
makes it impossible to model the anthropogenic component of

Like the CFA(7, 6) model, the SEM model in Fig.

Analogously, we may allow

According to LAS22,

Statistical significance of this estimate, which we denote

Derive the theoretical expression for

Replace unknown free parameters in

Apply the delta method, described in Sect. S2.5 in the Supplement, to obtain
an estimate of the variance of the asymptotic distribution of

Calculate the two-sided

To avoid an excessively long result section, we present a

The structure of this section is the following. First, we present the results of the preliminary analysis
of each (final) single-forcing ensemble, given in Table

As a preliminary step, we apply the CFA(

The analysis of the

A note on

Similar analyses of the

For each of the above-mentioned ensembles, an a priori estimate of

An opposite result was observed for the

Due to treating correlations among the latent factors as free parameters, the ME-CFA(6, 5) model (and
the ME model as well) becomes theoretically under-identified, if at least one of the factor loadings
is restricted a priori to zero, while the associated correlations are still treated as free parameters.
In practice, the data, for which some factor loadings are expected to be arbitrarily
near zero, are associated with the so-called weak-signal regime

Therefore, knowing that fitting the ME-CFA(6, 5) model to the North America data
is likely to result in negligible estimates of

The result of estimating Model 1, i.e. the ME-CFA(6, 5) model,
defined in Table

Based on this result, the ME-CFA(6, 5) model cannot be selected as an adequate approximation of
the underlying latent relationships, even if the model fits the data perfectly, both statistically and heuristically
(e.g. for data set no. 1, the

To avoid the weak-signal regime, we could delete

In order to avoid empirical under-identifiability, each modified version of the basic CFA(7, 6) model was
formulated under the restrictions that

The estimates of the modified version, which demonstrated the most stable performance across all data sets,
are presented in Table

The result of estimating Model 2, i.e. the CFA(7, 6) model, defined
in Table

According to the parameter estimates, the CFA model suggests that the (direct) effects of the solar and volcanic
forcings are well pronounced in the simulated annual-mean temperature in North America during 850–1849 CE.
For example, for data set no. 1, it was observed that

The overall (direct) effect of the GHG forcing is also estimated by the CFA model as significant
(for data set no. 1,

When estimating the above-presented CFA model, the modification indices indicated that the overall model fit
could be further improved if

Path diagram of the modified version of the SEM model from Fig.

Although no dynamical relationships between the reconstructions of the forcings and the internal processes
were implemented in the climate modelling experiment under consideration, the causal input from

Examples of possible real-world internal processes, interacting with the climate system and which are relevant
for the climate model used here, are seasonal variations in the vegetation phenology and in the snow cover.
Using the statistical parlance of LAS22, we can also say that these processes are

A disadvantage of letting

As one can see in Fig.

One can also see in Fig.

In addition to the inputs

The results of estimating the SEM model depicted in Fig.

Yet another causal input, received by

The overall (direct) effect of the natural and anthropogenic components of

The SEM model suggests that the effect of the solar and volcanic forcings is
also well pronounced in the simulated annual-mean temperature in North America during 850–1849 CE
(for data set no. 1,

Finally, let us emphasise that the latent structure of the SEM model suggests that the forcings associated with causally independent climatological processes (here, the solar, volcanic, and anthropogenic GHG forcings) are acting additively.

The ME-CFA(6, 5) model is rejected due to its under-identifiability, which caused either inadmissible solutions or inability of the estimation procedure to converge to a solution.

In contrast, both CFA and SEM models fit the data well and have admissible solutions. Moreover, they lead to similar conclusions about the direct effects of the forcings of interest. The sole difference is that the CFA model suggests that the significant overall effect of the GHG forcing is mostly due to (global-scaled) anthropogenic changes in GHG concentrations, while the SEM model highlights the dominant role of (global-scaled) natural changes. For the period of interest, both conclusions seem to be defensible and realistic from the climatological point of view.

Also, both CFA and SEM models demonstrated a stable performance and a very low sensitivity to
starting values for the parameter estimates. However, the SEM model estimates two fewer parameters
than the CFA model. So, in terms of the number of the parameters, the SEM model is simpler than
the CFA model, though its underlying structure is more sophisticated from a climatological point of view.
A lower number of parameters entails a higher number of degrees of freedom. More precisely, the SEM model has 2 more degrees of freedom, which increases the power of the

All these points together speak in favour of the SEM model. Therefore, our suggestion is to choose the SEM model as an adequate approximation of the underlying latent structure of the simulated annual-mean temperature data for the region of North America during 850–1849 CE.

The brief summaries of the results for the remaining regions are presented in Tables

Summary of the result for Europe, summer (JJA) mean temperature, 850–1849 CE.

Summary of the result for the Arctic, annual-mean temperature, 850–1849 CE.

Summary of the result for South America, summer (DJF) mean temperature, 850–1849 CE.

Summary of the result for Antarctica, annual-mean temperature, 850–1849 CE.

Summary of the result for Asia, summer (JJA) mean temperature, 850–1849 CE.

Summary of the result for Australasia, warm-season (Sept–Feb) mean temperature, 850–1849 CE.

According to the summaries provided, the SEM model has been chosen as a final model for all regions/seasons considered, except for Australasia. This result, first of all, indicates a complex causal structure of the simulated temperature data analysed, which required freeing up various causal links not permitted in the two other statistical models. For the region of Australasia, it was decided to choose the CFA model as a final model in accordance with the principle of parsimony.

All final models seem to have a climatologically defensible interpretation.
Summarising the results per forcing, we may say the following:

The direct effect of the volcanic forcing is well pronounced in all seven regions/seasons. In all cases, the volcanic forcing is found to have by far the strongest effect on the simulated temperatures, as compared to the effect from the other forcings.

The direct effect of the orbital forcing is well detected in the simulated temperatures in three regions: the Arctic, Asia, and South America. A modest effect of the forcing is detected in the simulated Australasia warm-season mean temperatures. No effect of the orbital forcing is found in the simulated temperatures in North America, Europe, and Antarctica.

A significant direct effect of the solar forcing is detected in five of the seven regions/seasons. No effect of the solar forcing is found in the simulated Asia (JJA) and South America (DJF) temperatures.

A significant overall direct effect of the GHG forcing is detected in four of the seven
regions/seasons. Concerning the remaining three regions (Arctic, South America, and Antarctica), no effect of
the GHG forcing is found in the corresponding simulated temperatures. In the regions where the effect of the
GHG forcing was detected, its character was described by the final models as follows:

In the

In the

In the

In the

A significant direct effect of the land-use forcing is detected in only one of the seven regions/seasons, namely, in the Asia (JJA) mean temperatures.

No effect of the interactions between the external forcings, leading to deviations from the additivity of the forcing effects, was found by the final statistical models in any of the seven regions/seasons.

All of them are only justified for the particular climate model, period, regions, and seasons investigated in the analysis.

The availability of simulated data was one of the important factors determining the complexity of the statistical models analysed. The absence of the climate model simulations, driven by various combinations of the five forcings of interest, led to a substantial simplification of the climatological relationships modelled in the statistical models presented. Nevertheless, the conclusions about the effect of the five forcings under consideration presented in the summaries are judged to be realistic and climatologically defensible.

The main aim of the present numerical experiment is to evaluate and compare the performance of three statistical
models by fitting them to one and the same simulated temperature data set. The models are as follows:

the measurement error (ME) model, used in many D&A studies and there referred as to the method of “optimal fingerprinting” (here, rewritten as a factor model);

the confirmatory factor analysis (CFA) model;

the structural equation modelling (SEM) model.

The ME model estimates the forcing effects in accordance with the total least squares estimation
approach under the condition that

The CFA model, in contrast to the ME model above, allows for the modelling of mutually uncorrelated latent temperature responses to forcings but, just as the ME model above, does not allow for any causal relationships between them.

The SEM model is the most complex model, allowing for both uncorrelated latent temperature responses to forcings and various causal relationships between all model variables, including the latent ones.

The data, used in the analysis, consist of simulated temperatures obtained with the CESM Earth system model, covering the period of 850–1849 CE. The regions of interest coincide with the seven PAGES 2k regions: Europe, North America, Arctic, Asia, South America, Australasia, and Antarctica. Each statistical model above takes into account the fact that the CESM climate model was driven by five specific (reconstructed) forcings: the orbital, volcanic, and solar forcings, each of which is a purely natural forcing, the anthropogenic land-use forcing, and the GHG forcing, which may contain both natural and anthropogenic components.

A key feature of the present numerical experiment is that observational temperature data, or more precisely, the (real-world) observational data, are replaced by data from a climate model simulation forced by all five (reconstructed) forcings under study. This replacement makes it reasonable to accept the assumption that the simulated latent temperature responses to forcings, embedded in the simulated observable temperatures, are correctly represented regarding their magnitude and shape of their temporal evolution, compared to the corresponding latent temperature responses embedded in the pseudo-true temperature. Given this knowledge, a poor model fit to the data can be attributed to incorrectly specified unknown underlying relationships between the variables.

A good model fit, on the other hand, was only one of the three criteria for choosing a final model among the three
statistical models studied. The two other criteria were as follows:

The solution provided by the model is statistically admissible and climatologically defensible.

The model demonstrates a stable performance across all data sets, including a different realisation of the pseudo-true temperature available for the region in question.

One of the possible explanations of this result can be a complex causal structure of the data, not reflected in the ME-CFA model. Another possible explanation is that the estimation procedure of its parameters becomes unstable under the weak-signal regime observed for each regional data. However, the fact that this statistical model has been rejected in our analysis for all specific regional data does not imply that the model is inappropriate in other studies (either preceding or future ones). For another climate model, another set of forcings, and other regions and periods, ME-CFA models might turn out to be sufficient for describing the underlying latent structure of data.

A key idea of the numerical experiment presented (and of the framework on the whole) is that the researcher's thinking concerning the statistical modelling of climatological relationships should not be limited to a single statistical model. As underlined by the observed results, the availability of several statistical models is a basis for flexible evaluations of climate models concerning the representation of temperature responses to climate forcings. The degree of flexibility in choosing appropriate statistical models can further be increased by further modifications and improvements of our statistical models.

As a final comment, we would like to point out that the performance of the framework suggested was studied only for zero noise in the pseudo-observational data. However, as real observational data may contain significant and varying amounts of non-climatic noise, it is highly desirable to investigate its performance (in particular, the performance of the models chosen as final models) for more realistic levels of added noise, similar to what is found in real climate proxy data for past temperature variations. These investigations can also be complemented by the analysis of empirical coverage rates of approximate confidence intervals for parameter estimates that may differ from their nominal levels due to the approximative nature of the distributions of the parameter estimates, especially for endogenous parameters whose asymptotic variances are functions of several parameter estimates and are calculated using the delta method.

The present work employed the

The simulation data used in this study are available from the Bolin Centre Database, Stockholm University (

The supplement related to this article is available online at:

This work is based on ideas presented in a doctoral thesis in mathematical statistics by
Ekaterina Fetisova

The contact author has declared that none of the authors has any competing interests.

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors thank Shaobo Jin (Department of Statistics, Uppsala University), for rewarding discussions about structural equation modelling.

This research was funded by the Swedish Research Council (grant C0592401 to Gudrun Brattström, “A statistical framework for comparing paleoclimate data and climate model simulations”).

This paper was edited by Francis Zwiers and reviewed by two anonymous referees.