Evaluation of climate model simulations is a crucial task in climate research. Here, a new
statistical framework is proposed for evaluation of simulated temperature responses
to climate forcings against temperature reconstructions derived from climate proxy data for
the last millennium. The framework includes two types of statistical models, each of which is
based on the concept of latent (unobservable)
variables:

Climate models are powerful tools used for investigating how the climate system works,
for making scenarios of the future climate and for assessing potential impacts of climatic
changes

In order to assess the magnitude of the effects of the processes in question on the climate, it is often
convenient to analyse their impact on the radiative balance of the Earth

Examples of external

The range of types of climate models is very wide. Here, our focus is on the most sophisticated
state-of-the art climate models referred to as global climate models (GCMs) or Earth system models (ESMs).
As computing capabilities have evolved during the past decades, the complexity of GCMs and ESMs has
substantially increased: for instance, the number of components of the Earth system that can be included
and coupled in GCMs and ESMs has increased, or the previously often performed equilibrium simulations can now
be replaced by transient simulations, driven, for example, by temporal changes in the atmospheric greenhouse gases (GHGs) and
aerosol loading (see, for example,

However, despite great advances achieved during the past decades, some simplifications are unavoidable, for example, due to the range of temporal and spatial scales involved and/or incomplete knowledge about some processes.
As a consequence, the complexity of the most sophisticated climate models is still far from the complexity
of the real climate system. Even a careful design cannot guarantee that each component of climate modelling, for example, parameterisation of subgrid-scale processes, has been employed in its optimal form. Also, our knowledge
about various feedback processes, for example, cloud feedback

Another type of complication is that forcing reconstructions too may be uncertain.
As examples, uncertainties can be large for such anthropogenic forcings as aerosol forcing and land use forcing

All the above-mentioned issues together point naturally to the importance of carefully undertaking evaluation of
climate model simulations by comparison against the observed climate state and variability. An important role in
this context has been played by so-called detection and attribution (D&A) studies
(e.g.,

Based on the ideas of

In terms of the near-surface temperature, which has been a climatic variable of interest in
many D&A studies, an advantageous assumption made in

Being the simplest model among statistical models with latent variables, the ME model specification has proved
to be a useful tool within many D&A studies that greatly contributed to the understanding of the causes of climate
variability. However, as recognised by several researchers, this statistical model is associated with
certain limitations, for example, the inability to take into account the effects of possible interactions
between forcings (see, for example,

Within the present work, observational data are defined as data consisting of instrumental temperature measurements, when they are available, and temperature reconstructions derived from climate proxy data, i.e. from indirect climate information from various natural archives such as tree rings, lake sediments, and cave speleothems.

Nor does the simplicity of the ME model specification allow researchers to avoid the estimation issues that arise under the so-called “weak-signal” regimeHaving a statistical framework that can address the questions posed in the D&A studies and, at the same time, lends itself to flexible specifications of latent structures, depending on hypotheses that researchers have and on the properties of both the climate model and the forcings considered, may potentially aid in overcoming the above-mentioned limitations of the ME model. All these points together may ultimately increase our confidence in the final estimates and conclusions drawn.

The goal of the present study is to formulate such a statistical framework by investigating possible extensions of the ME model specification to more complex statistical models with latent variables.

To this end, we used the fact that a ME model is a special case of a confirmatory factor analysis (CFA) model, which in turn is a special case of a structural equation modelling (SEM) model

As a matter of fact, the notion of causality is not new to climate research. As examples, we can refer to

One of the major
differences between the methods is that SEM models test hypothesised causal relationships between model variables based
on the (co)variances of the model variables, while the methods mentioned above investigate the causality based
on the information available at different time points of the time series analysed (for an overview of
methods used for investigating the causality for time series see

When formulating CFA and SEM models here, we also used the ideas of another statistical framework developed
by

The statistical models considered here are intended to be suitable for the type of observational and
climate model data that is typically available for the last millennium, as also exemplified in some
D&A studies (e.g.

Finally, let us describe the structure of the present paper. First, in Sect. 2, some main assumptions and
definitions of our framework will be described. Section 3 gives an overview of the statistical model
used in D&A studies and its link to the CFA model specification.
Section 4 provides a description of our CFA models, while the SEM models are presented in Sect. 5.
Section 6 provides a brief practical demonstration of fitting a simple CFA model to two ensembles of climate
model simulations using the

Although both the real and simulated climate systems comprise several climate variables in a 3-dimensional spatial framework in the atmosphere, in the oceans, and on land, here we will only think in terms of air temperatures near the Earth's surface. Climate scientists often refer to this as either surface air temperature or 2 m air temperature depending on context. We will simply call this “temperature”.

The term “unforced climate model” or just “unforced model” denotes here a simulation not driven by any external forcing. That is, only internal factors influence the simulated temperature variations. More precisely, the boundary conditions that are associated with the forcing factors of interest are held constant throughout the entire simulation time, at some level selected by the researcher. Climate modellers often refer to this kind of simulation as a control simulation.

When running the same climate model again, but with the control boundary conditions replaced with a reconstruction
of temporal and spatial changes in a particular forcing

To make our way of reasoning as clear and concrete as possible, we focus on five specific forcings:

changes in the solar irradiance (abbr. Sol);

changes in the orbital parameters of the Earth (abbr. Orb);

changes in the amount of stratospheric aerosols of volcanic origin (abbr. Volc);

changes in vegetation and land cover caused by natural and anthropogenic factors (abbr. Land);

changes in the concentrations of greenhouse gases in the atmosphere (abbr. GHGs), also of both natural and anthropogenic origin.

In the real-world climate system within a certain region and time period

The superscript T stands for “true”.

:All the above-specified forcings are identified as drivers of the climate change during the last
millennium

Depending on the scientific question of interest, climate models may be driven by different
combinations of reconstructed forcings. For the purpose of our theoretical discussion, let us first assume
that the following temperature data from climate model simulations are available:

is forced by the reconstruction of the solar forcing, which generates the simulated
counterpart to

The superscript S stands for “simulated”.

is forced by the reconstruction of the orbital forcing, which generates the simulated
counterpart to

is forced by the reconstruction of the volcanic forcing, which generates the simulated
counterpart to

is forced by the reconstruction of the Land forcing of both natural and anthropogenic
origin, which generates the two-component simulated counterpart to

is forced by the reconstruction of the GHG forcing of both natural and anthropogenic origin,
which generates

is forced by all reconstructed forcings above, generating the overall simulated
temperature response

Note that we do not assume that climate model simulations driven by all possible combinations of forcings, for example, the combination of solar and volcanic forcings or the combination of solar and Land forcing, are available.

A general statistical model, used for assessing the individual contribution of

For the purpose of our analysis let us rewrite model (

In the case of Eq. (

Equations (

The introduction of an error into Eq. (

From the estimation point of view, a consequence of adding various error terms to

Importantly, the TLS estimator remains unchanged if the whole error variance–covariance matrix is known a priori. In practice, this knowledge permits us to check for model validity, provided one derives this a priori knowledge from a source independent of the sample variance–covariance matrix of the observed variables. In D&A studies, such a source is unforced (control) climate model simulations.

We would also like to emphasise the fact that the TLS estimator is obtained under the condition that all
latent variables are correlated. This entails that if some

The main parameters of interest, estimated within D&A studies, are the coefficients

The detection of a simulated temperature response, however, in the observed climate record is not
sufficient for attributing this detected simulated temperature signal to the corresponding
real-world forcing

Another important feature of model (

Summarising the overview above, we may say that it is highly motivating to investigate possibilities of extending the ME model representation to more complex statistical models in order to overcome the limitations of the ME model.

To achieve this aim, we suggest using the close link between ME, CFA, and SEM models. To see that
the ME model in Eq. (

In the matrix form given in Eq. (

Note also that an unstandardised factor model is associated with an unstandardised solution. However,
a standardised solution is preferred because its model coefficients
(hereafter called

In model (

In practice, it is not difficult to test various equality constraints within a CFA model – one simply fits
the associated factor model under the constraints of interest. That is, one fits and tests simultaneously
(for more theoretical details about the estimation of CFA models, see the Appendix).
In the same manner, one may introduce restrictions on the correlations among the latent factors.

Another advantage of thinking in the spirit of CFA is that the factor model specification makes it possible to
take into account the lack of additivity, which may arise due to possible interactions between forcing.
This can be accomplished by adding observable variables associated with various multi-forcing climate model
simulations (provided such simulations are available).
Finally, this model specification seems to permit a complete attribution assessment,
provided one uses

In this section, our aim is to formulate a basic CFA model with respect to the five specified forcings. The model is called basic because it is supposed to be modified depending on the climate-relevant characteristics of the specified forcings for the region and period of interest. Note also that we focus on a CFA model with standardised latent factors in order to enable meaningful comparisons of estimated effects of latent factors. For a brief account of general CFA models and the associated definitions, used in the following sections, see the Appendix.

As mentioned in the Introduction, one of the starting points for our framework is the SUN12 framework. Within the confines of the present work, we combine some of the definitions in SUN12 that are relevant for our work with our definitions.

Like D&A studies, decomposing the climate variability within the ME model into forced and unforced
components, SUN12 also implements the same decomposition within its statistical model. However,
unlike D&A studies, SUN12 allows more complex structures of random components both in

Since there are five forcings under investigation, the next step is to rewrite

In a similar way, our initial model for

The next step is to rewrite each

Let us describe this process by example of

Assuming that both magnitude and the large-scale shape of the true temperature response are correctly
simulated, Eq. (

An important feature of our framework is that it involves

Parameters of the seven-indicator and six-factor model, abbr. CFA(7, 6) model, hypothesising the consistency and the uncorrelatedness between three latent factors.

It is also important to highlight that under the CFA (and ME) model specification, all common latent factors can be related to each other only through correlations. Climatologically, this corresponds to viewing all underlying forcings as physically independent processes not capable of causing changes in each other but giving rise to temperature responses that can be either mutually correlated or not.

Given the preliminaries above, we can finally formulate our basic CFA model that hypothesises the consistency between the simulated and true temperature
responses. The parameters of the resulting CFA model with seven indicators and six common factors are given in Table

Another feature of the model is that the variances of the specific factors

The advantage of treating

In contrast, the CFA model in Table

In addition, the estimates of the factor loadings can be used for assessing the contribution of the real-world
forcings to the variability in

Both above-presented CFA(7, 6) models can be modified by setting desirable and climatologically justified
constraints on the parameters. For example, for testing that the effect of interactions is negligible,
one needs to set

Similar constraints can be placed on the parameters associated with other common factors, if one expects negligible forcing effects for the region of interest. Otherwise, the estimation procedure may become unstable, which may lead to an inadmissible solution or even the failure to converge to a solution.

Another reason to modify the CFA models presented arises when instead of

Such replacements may change the latent structure of the model. In case only

In case

Parameters of the seven-indicator and six-factor model, abbr. CFA(7, 6) model, arising as a result of relaxing
the hypotheses of CFA(7, 6) model in Table

Regardless of the latent structure hypothesised, it is important to emphasise that

If the solution obtained is admissible and climatologically defensible, the overall model fit to the data can be assessed both statistically and heuristically. In case of rejecting the hypothesised model, it is important to realise that the rejection does not unambiguously point to any particular constraint as at fault.

We also present the CFA model from Table

A one-headed arrow represents a
causal relationship between two variables, meaning that a change in the variable at the tail of the arrow will result in a change in the variable
at the head of the arrow (with all other variables in the diagram held constant). The former type of variable is referred to as

A curved two-headed arrow between two variables indicates that these variables may be correlated without any assumed direct relationship.

Two single-headed arrows connecting two variables signify reciprocal causation.

Latent variables are designated by placing them in circles and observed variables by placing them in squares, while disturbance/error terms are represented as latent variables, albeit without placing them in circles.

The path diagram for the CFA model in Table

Path diagram of the CFA model from Table

The CFA model specification can also be used for assessing the overall forcing effect. For this purpose, we formulated a two-indicator one-factor CFA model, which we present in the Supplement, together with the corresponding ME model used in D&A studies.

In CFA, latent variables can be related to each other exclusively
in terms of correlations, which says nothing about the underlying reasons for the correlation
(association). Indeed, an association between two variables, say

An example of physically complicated climatological relationships is climate–vegetation interactions.
To reflect (to some extent) this climatological mechanism statistically, we recall first that

In addition, we note that natural changes in the Land cover and vegetation may also be caused by natural changes in the levels of GHGs in
the atmosphere. In terms of the common factors, the latter means that

By reasoning in a similar way, we may also formulate a corresponding basic equation for

Notice that Eqs. (

Another important comment on Eqs. (

The easiest way to get an overview of the above-discussed relationships is to represent them graphically by
means of a path diagram. Using the path diagram for the CFA(7, 6) model in Fig.

Path diagram of a non-recursive (i.e. containing reciprocal relationships) SEM model under the hypothesis
of consistency. The variance of each specific factor

The important features of the SEM model in Fig.

The latent variables

The latent variables

The common factors

There are two “new” observable variables, namely

The anthropogenic components

Just as the previously presented CFA models, the SEM model in Fig.

The same ideas can also be expressed by letting the observable

An initial SEM model, formulated on the basis of the basic SEM model, in accordance with climatological knowledge may also be modified empirically. Useful means in providing clues to specific model expansions are modification indices (for the details see Appendix A5). The main statistical advantage of model expansions is that they improve (to various extent) the overall model fit to data. Nevertheless, such modifications should be made judiciously as they lead to a reduction in the degrees of freedom. If an initial SEM model, on the other hand, demonstrates a reasonable fit both statistically and heuristically, model simplifications might be of more interest than model expansions.

In connection with empirical data-driven modifications of SEM (and CFA) models, we would also like to emphasise that the choice of a final or tentative model should not be made exclusively on a statistical basis – any modification ought to be defensible from the climatological point of view and reflect our knowledge about both the real-world climate system and the climate model under consideration. Also, a final SEM (and CFA) model should not be taken as a correct model, even if the model was not obtained as a result of empirical modifications. When accepting a final model, we can only say that “the model may be valid” because it does not contradict our assumptions and substantive knowledge.

In Sect. 4, we suggested estimators (

The variances of

The

The magnitude of the forcing effect is the same for each ensemble member.

A possible way to check the validity of the estimators is to analyse the ensemble members by means of an appropriate CFA model. To this end, two simple CFA models were formulated. Their description is given in Sect. 6.1. In Sect. 6.2, we illustrate a practical application of one of these models, thereby demonstrating practical details of fitting a CFA model.

Using the definition of

Model (

If the model fits the data adequately both statistically or heuristically, and the resulting estimate of

A corresponding CFA model associated with estimator (

We have analysed simulated near-surface temperatures generated with the Community Earth System Model
(CESM) version 1.1 for the period 850–2005, the CESM-LME (Last Millennium Ensemble), which includes
single-forcing ensembles with each of solar, volcanic, orbital, land, and GHG forcing alone, as well as
several simulations where all forcings are used together. The CESM-LME experiment used

For the purpose of illustrating a practical application of a CFA model, we analyse

Annual-mean temperatures are used for the Arctic and the warm-season temperatures (JJA) for Asia.
This choice depends on what was considered by the

Two important aspects to remember when applying CFA and SEM models (as well as ME models) are that
these models assume that data are normally distributed and do not exhibit autocorrelation.
Since the forced component of simulated temperatures, i.e.

Here, to avoid autocorrelation, all raw

As supported by Fig. S4 in the Supplement, the assumption of time-independent observations is
satisfied (because at least 91 % of the autocorrelation coefficients are insignificant as they are
within the 90 % confidence bounds). Further, Fig. S4 in the Supplement also suggests that the decadally resolved residual
sequences also demonstrate reasonable compliance with a normal distribution. This conclusion
was also supported by the Shapiro–Wilk test

In the present work, we have used the

The main steps of the estimation procedure in

(A part of) Outputs produced by the

According to Table

Concerning the overall model fit, the output indicates that the model fits the data very well, both
statistically and heuristically. Indeed, the model

The output also contains information about the modification indices. The

For the Arctic data, the conclusions are opposite. That is, the CFA(5, 1) model is rejected both
statistically and heuristically. The model

The same conclusion is indicated by the modification indices, suggesting that the model fit can be substantially improved.
The largest modification index of 7.063 (with 1 df) suggests that
the replicate number 4 differs from the other replicates in terms of the internal
variability. In addition, the largest modification index for the

We refrain from discussing possible reasons for the observed differences between
the replicates and whether the reasons, suggested by the modification indices,
are true or not. We can only say that if one wishes to continue the analysis of
all ensembles by means of the CFA and/or SEM models suggested here, it is then motivating
to try refitting the CFA(

The present paper provides a theoretical background of a new statistical framework for the evaluation of simulated responses to climate forcings against observational climate data. A key idea of the framework, comprising two groups of statistical models, is that the process of evaluation should not be limited to a single statistical model. The models suggested here are CFA and SEM models, each of which is based on the concept of latent variables. Although they are closely related to each other, there are several differences between them, which allow for a statistical modelling of climatological relationships in various ways.

The idea of using CFA and SEM models originates from D&A fingerprinting studies employing statistical models known as measurement error (ME) models (or equivalently errors-in-variables models). As a matter of fact, an ME model is a special case of a CFA model, which means that an ME model is a special case of a SEM model as well. In the present work, using this close connection between the three types of statistical models, the ME model specification has been extended first to the CFA and SEM model specifications.

The theoretical results of this work have demonstrated that both CFA and SEM models,
just as ME models in D&A studies, are, first of all, capable of addressing the questions
posed in D&A studies, namely the assessment of the contribution of the forcings to the temperature variability
(the questions of detection and attribution) and the evaluation of climate model simulations in terms of temperature
responses to forcings (the question of consistency).
In addition, the extensions have provided the following advantageous possibilities:

The structure of the underlying relationships can be varied between latent temperature responses to forcings in accordance to their properties and interpretations. For example, one may assume that latent temperature responses to some forcings among those considered are mutually uncorrelated. Such restrictions are especially desirable for analysing climate data associated with the so-called weak-signal regime.

The assumption of the additivity of forcing effects can be relaxed.
At this point, let us remark that according to

Multi-forcing climate model simulations can be evaluated not only in terms of the overall forcing effect to a combination of forcings (see Sect. 1.2 in the Supplement), but also in terms of individual forcing effects.

Non-climatic noise in observational data can be taken into account.

The contribution of each forcing to the observed temperature variability can be assessed and the simulated responses to climate forcings evaluated, not only simultaneously but also separately if needed.

Complicated climatological feedback mechanisms within the SEM model specification can be statistically modelled, which allows for various causal relationships not permitted within either ME or CFA models.

Here, we would like to point out that the underlying latent causal structures suggested in this work are only rough approximations of the real-world climatological feedback mechanisms. The degree of approximation depends directly on the availability of climate model simulations driven by various combinations of the forcings of interest. In the present work, it was assumed that only one type of multi-forcing simulations is available, namely a simulation generated by a climate model driven by all forcings of interest simultaneously. As a result, the departure from the additivity of individual forcing effects could be modelled only by a single latent variable, which represented an overall effect of possible interactions of the forcings of interest, regardless of their origin. The impossibility to split this interaction term into several subcomponents, each of which is either of natural or anthropogenic character, entails certain interpretation difficulties of some relationships within our SEM model. However, the issue can be resolved as soon as more multi-forcing simulations, driven by various combinations of forcings, are available. The issue also becomes irrelevant under the assumption of additivity.

Other limitations of the presented statistical models presented here are as follows:

They are formulated under the assumption of no autocorrelation, which is
unrealistic in the case of climate data. To overcome this issue, it was suggested here
to perform a time aggregation of the data (both simulated and real-world),
which unavoidably reduces the sample size. Depending on the period of interest, the reduction
may be substantial, which may lead to unreliable statistical inferences. In such situations,
to accept larger sample sizes, however, would mean that one has to accept a certain autocorrelation.
Thus, in future work, it is of interest to investigate the impact of various levels of autocorrelation
on the validity of significance tests. Another research question of interest concerns other ways of
compensating for the presence of autocorrelation, for example, by replacing the sample size by the effective
sample size

They are formulated under the assumption of a time-constant variance of observational data.
This assumption is likely to be violated when data cover both pre-industrial and industrial
periods. Some research on this topic has already been done

They are suitable for analysing data from a single region only. That is, they do not allow
for a simultaneous assessment of a given forcing's contribution within each region under consideration.
To be able to perform such multi-regional analyses, meaning that the resulting variance–covariance matrix
of observed variables comprises several regional variance–covariance matrices in a block manner,
the CFA (and SEM) models need to be extended accordingly. Conceivable starting points
for this work can be found in

The methods presented here do not involve dimensionality reduction (including
pre-whitening), which is performed in D&A studies

In practice, fitting large over-identified CFA and SEM models (with many observable variables) is expected to be challenging, both from the statistical and climatological perspective, compared to ME models used in D&A studies, which ultimately may require close collaboration between statisticians, paleoclimatologists, and climate modellers.

Despite the above-mentioned limitations of our framework, we firmly believe that the framework has a capacity to become a powerful and flexible tool for deriving valuable insights into the properties of climate models and the role of climate forcings in climate change, which ultimately may improve our understanding of various mechanisms and processes in the real-world climate system. Moreover, its degree of flexibility in forming an appropriate statistical model can further be increased by viewing the ME model specification as a part of the framework. According to the principle of parsimony, it is always motivating to prefer a simpler model demonstrating an acceptable and adequate performance to a more complicated one.

Our concluding remark is that the characteristics of the statistical models within our framework, capable
of addressing the questions posed in D&A studies, were discussed only theoretically. Prior to
employing them in practical analyses involving real-world observational data, their performance needs
to be evaluated in a controlled numerical experiment, within which it is known that the simulated
temperature responses to forcings of interest are correctly represented,
both in terms of their magnitude and shape. This will be the purpose of the analysis presented by

A general CFA model with

In the terminology of factor analysis, the observed variables are called

The main characteristic of CFA is that the researcher formulates a factor model, or a set of models,
in accordance with a substantive theory about the underlying common factor structure. That is, the number
of latent factors, their interpretation, and the nature of the factor loadings are specified a priori.
In addition, researchers can have certain hypotheses, which results in additional restrictions on the parameter
space. A typical classification of parameters within CFA is the following

A

A

A

The estimation of parameters in CFA is based on the idea
that the population variance–covariance matrix of the indicators,

As shown by

The inverse of Eq. (

One can use the estimated variances to test each estimated parameter

A key concept in CFA is identifiability of parameters. Identifiability is closely
related to the ability to estimate the model parameters from a sample generated by the model,
given restrictions imposed on the parameters. The general identifiability rule states that if an unknown parameter

Based on this definition of identifiability, a factor model can be classified as

For over-identified models, the number of free parameters is smaller than the number of unique equations

Notice that even if the number of free parameters is smaller than or equal to the number of unique equations in

One way to establish the identifiability is to solve structural covariance equations

One of them is the empirical test of the matrix of second-order derivatives of the discrepancy function in
Eq. (

According to

Yet another possible check for identifiability is to estimate the model with different starting values for free parameters in the iterative estimation algorithm to see whether or not the algorithm converges to the same parameter estimates each time. This empirical test, however, should be used with great care. Choosing inappropriate starting values may cause the failure of convergence, although the model is theoretically identified.

In practice, the estimation procedure may produce parameter estimates, although the model is
theoretically under-identified.
Such a phenomenon is known as empirical under-identifiability

In order to avoid empirical under-identifiability, and in order to undertake justified empirical modifications of the model in case they are needed, it is important to identify the causes of the model's theoretical under-identification prior to estimating a CFA model. To this end, it would be sufficient to ensure that each parameter is solvable from structural covariance equations, without deriving closed analytical expressions of the solution.

Examining the variance–covariance matrix of the asymptotic distribution of the model estimates is also helpful for revealing empirical under-identifiability. If the model is nearly under-identified, it will be reflected in high covariances between two or more parameter estimates.

Even if estimates are admissible, one should also ensure that they have the anticipated signs and magnitudes.

For just-identified CFA models, the function

For over-identified models, arising due to additional constraints imposed on some model parameters,
at least one (free) parameter can be expressed by more than one distinct equation in terms of the variances and
covariances of indicators. Therefore, the fit between

To this end, one uses the fact that the discrepancy function (

In large samples, the

If the solution obtained is admissible and interpretable, the statistical assessment of the overall model fit
is performed by means of the

When applying the

As for cut-off values of the indices, the following rules of thumb have been recommended. The GFI
for good-fitting models should be greater than 0.90, while for the AGFI the suggested cut-off
value is 0.8

Notice that the goodness-of-fit indices can be used both for assessing the fit of a single CFA model and for a number of competing models fitted to the same data set.

According to the “pure” confirmatory approach, the rejection of the hypothesised model, whose estimated parameters are judged
to be admissible and interpretable, means that one rejects the model and the associated underlying hypotheses,
without proceeding with any updating of this hypothesised structure. However, in practice, researchers do proceed.
The first aspect to check is whether no key elements of the underlying hypotheses are missing. Further,
this motivates a check of other possible reasons of the poor model fit, such as small sample size, non-normality,
or missing data

Developed by

Modified models suggested by modification indices are so-called nested models, fitted to the same
data. That is, each of them is a special case of the initial model, where the parameter
suggested to be estimated, is constrained to zero. According to statistical theory in

The present work employed the

The supplement related to this article is available online at:

This work is based on ideas presented in a doctoral thesis in mathematical statistics by
Ekaterina Fetisova

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors would like to thank Qiong Zhang (Department of Physical Geography, Stockholm University) for helpful explanations of some aspects of climate modelling.

This research was funded by the Swedish Research Council (grant C0592401 to Gudrun Brattström, ”A statistical framework for comparing paleoclimate data and climate model simulations”).

This paper was edited by Francis Zwiers and reviewed by two anonymous referees.