Journal cover
Journal topic
**Advances in Statistical Climatology, Meteorology and Oceanography**
An international open-access journal on applied statistics

Journal topic

** **
26 Nov 2019

26 Nov 2019

An improved projection of climate observations for detection and attribution

^{1}Ouranos, Montreal, Canada^{2}IFAECI, CNRS-CONICET-UBA, Buenos Aires, Argentina

^{1}Ouranos, Montreal, Canada^{2}IFAECI, CNRS-CONICET-UBA, Buenos Aires, Argentina

**Correspondence**: Alexis Hannart (hannart.alexis@ouranos.ca)

**Correspondence**: Alexis Hannart (hannart.alexis@ouranos.ca)

Abstract

Back to toptop
An important goal
of climate research is to determine the causal contribution of human activity
to observed changes in the climate system. Methodologically speaking, most
climatic causal studies to date have been formulating attribution as a linear
regression inference problem. Under this formulation, the inference is often
obtained by using the generalized least squares (GLS) estimator after
projecting the data on the *r* leading eigenvectors of the covariance
associated with internal variability, which are evaluated from numerical
climate models. In this paper, we revisit the problem of obtaining a GLS
estimator adapted to this particular situation, in which only the leading
eigenvectors of the noise's covariance are assumed to be known. After noting
that the eigenvectors associated with the lowest eigenvalues are in general
more valuable for inference purposes, we introduce an alternative estimator.
Our proposed estimator is shown to outperform the conventional estimator,
when using a simulation test bed that represents the 20th century temperature
evolution.

Download & links

How to cite

Back to top
top
How to cite.

Hannart, A.: An improved projection of climate observations for detection and attribution, Adv. Stat. Clim. Meteorol. Oceanogr., 5, 161–171, https://doi.org/10.5194/ascmo-5-161-2019, 2019.

1 Introduction

Back to toptop
An important goal of climate research is to determine the causes of past global
warming in general and the responsibility of human activity in particular (Hegerl et al., 2007); this
question has thus emerged as a research topic known as detection and attribution (D&A).
From a methodological standpoint, D&A studies are usually based on linear
regression methods, often referred to in this particular context as optimal
fingerprinting, whereby an observed climate change is regarded as a linear
combination of several externally forced signals added to internal climate
variability (Hasselmann, 1979; Bell, 1986; North et al., 1982; Allen and Tett, 1999; Hegerl and Zwiers, 2011). On the one hand, the latter signals consist
of spatial, temporal or space–time patterns of response to external forcings
as anticipated by one or several climate models. Internal climate
variability, on the other hand, is usually represented by a centred multivariate Gaussian
noise. Its covariance **Σ**, which is also estimated from model
simulations, thus describes the detailed space–time features of internal
variability. In particular, its eigenvectors can often be interpreted as the
modes of variability, whose relative magnitudes are reflected by the
associated eigenvalues.

Denoting *y* the *n*-dimensional vector of observed climate change,
$\mathbf{x}=({x}_{\mathrm{1}},\mathrm{\dots},{x}_{p})$ the *n*×*p* matrix concatenating the *p*
externally forced signals and ** ν** the internal climate variability noise,
the regression equation is as follows:

$$\begin{array}{}\text{(1)}& \begin{array}{l}\mathit{y}=\mathbf{x}\mathit{\beta}+\mathit{\nu}\end{array}.\end{array}$$

The results of the inference on the vector of regression coefficients
** β**,
and the magnitude of its confidence intervals, determine whether the external
signals “are present in the observations” (Hegerl and Zwiers, 2011) and whether
or not the observed change is attributable to each forcing.

Under the assumption that **Σ** is known, the above inference
problem can be solved using the standard generalized least squares (GLS) setting
(e.g. Amemiya, 1985). Under this framework, the expression of the best
linear unbiased estimator (BLUE) and its variance are given by the following:

$$\begin{array}{}\text{(2)}& \begin{array}{ll}\widehat{\mathit{\beta}}=\left({\mathbf{x}}^{\prime}{\mathbf{\Sigma}}^{-\mathrm{1}}\mathbf{x}{)}^{-\mathrm{1}}\right({\mathbf{x}}^{\prime}{\mathbf{\Sigma}}^{-\mathrm{1}}\mathit{y}),& \text{Var}\left(\widehat{\mathit{\beta}}\right)=({\mathbf{x}}^{\prime}{\mathbf{\Sigma}}^{-\mathrm{1}}\mathbf{x}{)}^{-\mathrm{1}}.\end{array}\end{array}$$

The GLS estimator of Eq. (2) can be obtained as the ordinary least squares (OLS) estimator for the linearly transformed regression equation:

$$\begin{array}{}\text{(3)}& \mathbf{T}\mathit{y}=\mathbf{Tx}\mathit{\beta}+\mathbf{T}\mathit{\nu},\end{array}$$

where the *n*×*n* transformation matrix **T** is such that the components
of the transformed noise **T**** ν** are independent and identically distributed (IID).
This condition yields $\mathbf{T}\mathbf{\Sigma}{\mathbf{T}}^{\prime}=\mathbf{I}$ and an
admissible choice for

$$\begin{array}{}\text{(4)}& \mathbf{T}=\mathbf{V}{\mathbf{\Delta}}^{-{\scriptscriptstyle \frac{\mathrm{1}}{\mathrm{2}}}}{\mathbf{V}}^{\prime},\end{array}$$

where $\mathbf{\Delta}=\text{diag}({\mathit{\lambda}}_{\mathrm{1}},\mathrm{\dots},{\mathit{\lambda}}_{n})$ with
${\mathit{\lambda}}_{\mathrm{1}}\ge \mathrm{\dots}\ge {\mathit{\lambda}}_{n}>\mathrm{0}$ is the matrix of eigenvalues
of **Σ** and $\mathbf{V}=({\mathit{v}}_{\mathrm{1}},\mathrm{\dots},{\mathit{v}}_{n})$ is its matrix of
column eigenvectors, i.e. $\mathbf{\Sigma}=\mathbf{V}\mathbf{\Delta}{\mathbf{V}}^{\prime}$.
In Eq. (3), the data are thus projected on the eigenvectors
of the covariance matrix scaled by the inverse square root of their eigenvalues.

In most applications, the covariance **Σ** is not known and
Eq. (2) cannot be implemented. This has given rise to many
theoretical developments (e.g. Poloni and Sbrana, 2014; Naderahmadian et al., 2015) which often rely on further
assumptions regarding the covariance **Σ**, in order to allow for
its estimation simultaneously with ** β**.
In the applicative context of D&A considered here, the situation may be
viewed as an in-between between known and unknown

Under this assumption, the inference on ** β** must therefore be obtained
based on these

$$\begin{array}{}\text{(5)}& {\mathbf{T}}_{r}={\mathbf{V}}_{r}{\mathbf{\Delta}}_{r}^{-{\scriptscriptstyle \frac{\mathrm{1}}{\mathrm{2}}}}{\mathbf{V}}_{r}^{\prime},\end{array}$$

where ${\mathbf{\Delta}}_{r}=\text{diag}({\mathit{\lambda}}_{\mathrm{1}},\mathrm{\dots},{\mathit{\lambda}}_{r})$ and ${\mathbf{V}}_{r}=({\mathit{v}}_{\mathrm{1}},\mathrm{\dots},{\text{v}}_{r})$.
A new estimator thus follows by similarly applying the OLS estimator to the
transformed data (**T**_{r}**x**,**T**_{r}** y**):

$$\begin{array}{}\text{(6)}& \begin{array}{l}{\widehat{\mathit{\beta}}}_{r}=\left({\mathbf{x}}^{\prime}{\mathbf{\Sigma}}_{r}^{-\mathrm{1}}\mathbf{x}{)}^{-\mathrm{1}}\right({\mathbf{x}}^{\prime}{\mathbf{\Sigma}}_{r}^{-\mathrm{1}}\mathit{y}),\end{array}\end{array}$$

where ${\mathbf{\Sigma}}_{r}^{-\mathrm{1}}={\mathbf{V}}_{r}{\mathbf{\Delta}}_{r}^{-\mathrm{1}}{\mathbf{V}}_{r}^{\prime}$ is
the pseudo-inverse of ${\mathbf{V}}_{r}{\mathbf{\Delta}}_{r}{\mathbf{V}}_{r}^{\prime}$. Finally,
from a mere terminology standpoint, applying the truncated transformation
**T**_{r} to the data is often referred to in D&A as “projecting” the
data onto the subspace spanned by the leading eigenvectors. However,
**T**_{r} is in fact not a projection from a mathematical standpoint insofar as
${\mathbf{T}}_{r}^{\mathrm{2}}\ne {\mathbf{T}}_{r}$ hence we prefer to use the word
“transformation”.

The solution recalled above in Eq. (6) is easy to use and may
seem to be a natural choice. It emerged as a popular approach in D&A
arguably for these reasons. Nevertheless, this approach has two important
limitations. Firstly, as pointed out by Hegerl et al. (1996), the leading
eigenvectors sometimes fail to capture the signals **x**. Indeed,
the leading eigenvectors capture variability well, but not necessarily the
signal, merely because the corresponding physical processes may differ and be
associated with different patterns. Secondly, by construction, the subspace
spanned by the leading eigenvectors maximizes the amplitude of the noise,
thereby minimizing the ratio of signal to noise and affecting the performance
of the inference on ** β**.

The first objective of this article is to highlight in detail these two
limitations: both will be formalized and discussed immediately below in
Sect. 2. Its second objective is to circumvent these limitations by
building a new estimator of ** β** that performs better than the
conventional estimator ${\widehat{\mathit{\beta}}}_{r}$ of Eq. (6) while
still making the same assumptions and using the same data. Most importantly,
the central assumption here is that only the

Section 2 highlights the limitations of the conventional estimator ${\widehat{\mathit{\beta}}}_{r}$. Section 3 introduces our proposed estimator ${\widehat{\mathit{\beta}}}_{r}^{*}$. Section 4 evaluates the benefit of our approach based on a realistic test bed, and illustrates it on actual observations of surface temperature. Section 5 concludes the paper.

2 Limitations of the conventional estimator

Back to toptop
It is useful to start this discussion by returning for a moment to the case
where **Σ** is assumed to be fully known, i.e. its *n* eigenvectors
are all known perfectly, as well as their eigenvalues.
Under this assumption, let us evaluate the benefit of a given eigenvector
*v*_{k}, $k=\mathrm{1},\mathrm{\dots},n$, for the inference of ** β**. For this purpose,
we propose to compare the variance
$\text{Var}\left(\widehat{\mathit{\beta}}\right)=({\mathbf{x}}^{\prime}{\mathbf{\Sigma}}^{-\mathrm{1}}\mathbf{x}{)}^{-\mathrm{1}}$ of the BLUE estimator of
Eq. (2) obtained by using the transformation matrix $\mathbf{T}=\mathbf{V}{\mathbf{\Delta}}^{-\frac{\mathrm{1}}{\mathrm{2}}}{\mathbf{V}}^{\prime}$, to the variance of the estimator
${\widehat{\mathit{\beta}}}_{-k}$ obtained by using the transformation matrix ${\mathbf{T}}_{-k}={\mathbf{V}}_{-k}{\mathbf{\Delta}}_{-k}^{-\frac{\mathrm{1}}{\mathrm{2}}}{\mathbf{V}}_{-k}^{\prime}$ where

$$\begin{array}{ll}{\displaystyle}\text{Var}\left(\widehat{\mathit{\beta}}\right)& {\displaystyle}=(\sum _{i=\mathrm{1}}^{n}{\mathbf{x}}^{\prime}{\mathit{v}}_{i}{\mathit{v}}_{i}^{\prime}\mathbf{x}/{\mathit{\lambda}}_{i}{)}^{-\mathrm{1}}\text{Var}({\widehat{\mathit{\beta}}}_{-k})\\ \text{(7)}& {\displaystyle}& {\displaystyle}=(\sum _{i=\mathrm{1},\mathrm{\dots},n\phantom{\rule{0.125em}{0ex}}\phantom{\rule{0.125em}{0ex}}\phantom{\rule{0.125em}{0ex}}i\ne k}{\mathbf{x}}^{\prime}{\mathit{v}}_{i}{\mathit{v}}_{i}^{\prime}\mathbf{x}/{\mathit{\lambda}}_{i}{)}^{-\mathrm{1}}.\end{array}$$

The Sherman–Morrison–Woodbury formula is a well known formula in linear algebra, which shows that the inverse of a rank-one correction of some matrix, can be conveniently computed by doing a rank one correction to the inverse of the original matrix. By applying the Sherman–Morrison–Woodbury formula to Eq. (7), we obtain the following:

$$\begin{array}{}\text{(8)}& \text{Var}\left({\widehat{\mathit{\beta}}}_{-k}\right)=\text{Var}\left(\widehat{\mathit{\beta}}\right)+{\displaystyle \frac{\text{Var}\left({\widehat{\mathit{\beta}}}_{-k}\right){\mathbf{x}}^{\prime}{\mathit{v}}_{k}{\mathit{v}}_{k}^{\prime}\mathbf{x}\text{Var}\left({\widehat{\mathit{\beta}}}_{-k}\right)}{{\mathit{\lambda}}_{k}+{\mathit{v}}_{k}^{\prime}\mathbf{x}\text{Var}\left({\widehat{\mathit{\beta}}}_{-k}\right){\mathbf{x}}^{\prime}{\mathit{v}}_{k}}}.\end{array}$$

Unsurprisingly, an immediate implication of Eq. (8) is that
removing any eigenvector *v*_{k} results in an increase of the variance
of the estimator. Indeed, the matrix in the second term of the right-hand side of
Eq. (8) is positive definite; it can thus be interpreted as the
cost in variance increase of excluding *v*_{k} or conversely as the benefit
in variance reduction of including *v*_{k}. Moreover, the most valuable
eigenvectors for inferring ** β** in the above-defined sense are those combining
a small eigenvalue (low

A similar conclusion can be reached by using a different approach. Consider this
time the transformation ${\mathbf{T}}^{*}={\mathbf{x}}^{\prime}{\mathbf{\Sigma}}^{-\mathrm{1}}$.
Since **T**^{*} is a *p*×*n* matrix, the data dimension is thus
drastically reduced from *n* to *p* in the transformed Eq. (3).
Furthermore, noting that the noise components **T**^{*}** ν** have
covariance ${\mathbf{T}}^{*}\mathbf{\Sigma}{{\mathbf{T}}^{*}}^{\prime}$, and
applying the GLS estimator to the transformed data
$({\mathbf{T}}^{*}\mathit{y},{\mathbf{T}}^{*}\mathbf{x})$, we obtain an estimator ${\widehat{\mathit{\beta}}}^{*}$ which has
the following expression:

$$\begin{array}{}\text{(9)}& {\widehat{\mathit{\beta}}}^{*}=\left({\mathbf{x}}^{\prime}{\mathbf{T}}^{*}\right({\mathbf{T}}^{*}\mathbf{\Sigma}{\mathbf{T}}^{*}{}^{\prime}{)}^{-\mathrm{1}}{\mathbf{T}}^{*}{}^{\prime}\mathbf{x}{)}^{-\mathrm{1}}\left({\mathbf{x}}^{\prime}{\mathbf{T}}^{*}\right({\mathbf{T}}^{*}\mathbf{\Sigma}{\mathbf{T}}^{*}{}^{\prime}{)}^{-\mathrm{1}}{\mathbf{T}}^{*}{}^{\prime}\mathit{y}).\end{array}$$

After some algebra, the above expression of ${\widehat{\mathit{\beta}}}^{*}$ simplifies to the following:

$$\begin{array}{}\text{(10)}& \begin{array}{ll}{\widehat{\mathit{\beta}}}^{*}& =\left({\mathbf{x}}^{\prime}{\mathbf{\Sigma}}^{-\mathrm{1}}\mathbf{x}{)}^{-\mathrm{1}}\right({\mathbf{x}}^{\prime}{\mathbf{\Sigma}}^{-\mathrm{1}}\mathit{y}),\end{array}\end{array}$$

which is exactly the expression of the estimator of Eq. (2).
Therefore, remarkably, the BLUE estimator obtained from the full data
(** y**,

$$\begin{array}{}\text{(11)}& {u}_{k}^{*}=\sum _{i=\mathrm{1}}^{n}{\displaystyle \frac{{\mathit{v}}_{i}^{\prime}{x}_{k}}{{\mathit{\lambda}}_{i}}}{\mathit{v}}_{i}.\end{array}$$

Consistently with Eq. (8), Eq. (11) shows
that for each $k=\mathrm{1},\mathrm{\dots},p$, the optimal direction ${u}_{k}^{*}$ emphasizes the
eigenvectors that best trade off the smallest possible eigenvalue (i.e. small
noise) with the best possible alignment between the eigenvector and the
signal *x*_{k} (i.e. strong signal). It is interesting to notice that the
foremost importance of small eigenvalues, which is underlined here, is also
quite obvious in the formulation of optimal detection proposed by
North and Stevens (1998). However, the importance of small eigenvalues was lost later on,
as truncation on leading eigenvectors became a popular practice.

In light of these considerations, the two main limitations of the conventional estimator ${\widehat{\mathit{\beta}}}_{r}$ of Eq. (6), mentioned in Sect. 1, appear more clearly. Firstly, in agreement with Hegerl et al. (1996), there is no guarantee that the leading eigenvectors will align well with the signals at stake. Secondly, the leading eigenvectors are by definition those with the largest eigenvalues. Therefore, a straightforward truncation onto the leading eigenvectors, such as the one used in ${\widehat{\mathit{\beta}}}_{r}$, maximizes the noise instead of minimizing it. While the first limitation may be an important pitfall in some cases, it will not be an issue whenever the signals align well with the leading eigenvectors, which nothing prevents in general. In contrast, the second limitation always manifests whenever ${\widehat{\mathit{\beta}}}_{r}$ is used and is thus arguably a more serious problem of the conventional approach. In the present paper, we thus choose to focus restrictively on this second limitation, and to not address the first one.

In order to illustrate the second issue more concretely, we use surface
temperature data from Ribes and Terray (2012), described in more detail in Section 4. The
signals **x** considered here consist of the responses to
anthropogenic and natural forcings, with *p*=2 and *n*=275. Both
**x** and the covariance Σ are obtained from climate model
simulations. The *r* leading eigenvectors ${\mathbf{V}}_{r}=({\mathit{v}}_{\mathrm{1}},\mathrm{\dots},{\text{v}}_{r})$
are obtained from **Σ** and
are assumed to be known. The proportion of the noise's total variance retained by
projecting the data on **V**_{r} is denoted ${\mathit{\sigma}}_{\text{n}}^{\mathrm{2}}\left(r\right)$
while the proportion of the signal's total variance associated with the same
projection is denoted ${\mathit{\sigma}}_{\text{s}}^{\mathrm{2}}\left(r\right)$. Since the projection matrix
associated with projecting on the subspace **V**_{r} is equal to **V**_{r}**V**_{r}^{′}, we have the following:

$$\begin{array}{ll}{\displaystyle}{\mathit{\sigma}}_{\text{n}}^{\mathrm{2}}\left(r\right)& {\displaystyle}=\mathbb{E}\left(\widehat{\text{Var}}\left({\mathbf{V}}_{r}{\mathbf{V}}_{r}^{\prime}\mathit{\nu}\right)\right)/\mathbb{E}\left(\widehat{\text{Var}}\left(\mathit{\nu}\right)\right)\\ \text{(12)}& {\displaystyle}& {\displaystyle}=\mathbb{E}\left({\mathit{\nu}}^{\prime}{\mathbf{V}}_{r}{\mathbf{V}}_{r}^{\prime}\mathit{\nu}\right)/\mathbb{E}\left({\mathit{\nu}}^{\prime}\mathit{\nu}\right)=\sum _{i=\mathrm{1}}^{r}{\mathit{\lambda}}_{i}/\sum _{i=\mathrm{1}}^{n}{\mathit{\lambda}}_{i},{\displaystyle}{\mathit{\sigma}}_{\text{s}}^{\mathrm{2}}\left(r\right)& {\displaystyle}=\mathbb{E}\left(\widehat{\text{Var}}\left({\mathbf{V}}_{r}{\mathbf{V}}_{r}^{\prime}{x}_{i}\right)\right)/\mathbb{E}\left(\widehat{\text{Var}}\left({x}_{i}\right)\right)\\ \text{(13)}& {\displaystyle}& {\displaystyle}={x}_{i}^{\prime}{\mathbf{V}}_{r}{\mathbf{V}}_{r}^{\prime}{x}_{i}/{x}_{i}^{\prime}{x}_{i}\phantom{\rule{0.125em}{0ex}}\phantom{\rule{0.125em}{0ex}}\phantom{\rule{0.125em}{0ex}}\phantom{\rule{0.125em}{0ex}}\phantom{\rule{0.125em}{0ex}}\phantom{\rule{0.125em}{0ex}}\phantom{\rule{0.125em}{0ex}}\phantom{\rule{0.125em}{0ex}}\text{for}\phantom{\rule{0.125em}{0ex}}\phantom{\rule{0.125em}{0ex}}\phantom{\rule{0.125em}{0ex}}\phantom{\rule{0.125em}{0ex}}i=\mathrm{1},\mathrm{\dots},p,\end{array}$$

where for any vector ** z**, $\widehat{\text{Var}}\left(\mathit{z}\right)$ denotes the empirical
variance of

Let us now consider the quantities ${\mathit{\sigma}}_{\text{n}}^{\mathrm{2}*}\left(r\right)$ and
${\mathit{\sigma}}_{\text{n}}^{\mathrm{2}*}\left(r\right)$ defined as above, but projecting this time on
the subspace orthogonal to **V**_{r}. By construction,

$$\begin{array}{}\text{(14)}& \begin{array}{l}{\mathit{\sigma}}_{\text{n}}^{\mathrm{2}*}\left(r\right)=\mathrm{1}-{\mathit{\sigma}}_{\text{n}}^{\mathrm{2}}\left(r\right),\\ {\mathit{\sigma}}_{\text{s}}^{\mathrm{2}*}\left(r\right)=\mathrm{1}-{\mathit{\sigma}}_{\text{s}}^{\mathrm{2}}\left(r\right).\end{array}\end{array}$$

As a direct consequence, for *r*=20, the projection on the subspace
orthogonal to **V**_{r} captures 92 % = 100 %–8 % of the total variance
of the signal, while the variance of the projected noise is only
20 % = 100 %–80 % of the total. Therefore, the signal-to-noise ratio of the
projected data (see Fig. 1b) is this time magnified by a factor of 2
compared to the signal-to-noise ratio of the full data, and by a factor of 14
when comparing to the signal-to-noise ratio of the data projected on
**V**_{r}. It can thus be seen that projecting the data on the orthogonal to
**V**_{r} greatly magnifies the signal-to-noise ratio by filtering out a
large fraction of the noise, while retaining most of the signal. These
characteristics arguably make it a much more relevant projection subspace for
the purpose of estimating ** β** than the subspace of the leading
eigenvectors. We further elaborate on this idea in Sect. 3.

3 An orthogonal estimator

Back to toptop
With these preliminary considerations in hand, we return to the situation of
direct interest here: inferring ** β** when only the

$$\begin{array}{}\text{(15)}& {\widehat{\mathit{\beta}}}_{r}^{*}=({\mathbf{x}}^{\prime}\mathbf{x}-{\mathbf{x}}^{\prime}{\mathbf{V}}_{r}{\mathbf{V}}_{r}^{\prime}\mathbf{x}{)}^{-\mathrm{1}}({\mathbf{x}}^{\prime}\mathit{y}-{\mathbf{x}}^{\prime}{\mathbf{V}}_{r}{\mathbf{V}}_{r}^{\prime}\mathit{y}).\end{array}$$

Like the conventional estimator ${\widehat{\mathit{\beta}}}_{r}$, the proposed estimator
${\widehat{\mathit{\beta}}}_{r}^{*}$ only uses the data (** y**,

Let us denote ${\mathbf{V}}_{r}^{\u27c2}$ a $n\times (n-r)$ matrix consisting of
*n*−*r* orthonormal column vectors spanning the subspace orthogonal to
$({\mathit{v}}_{\mathrm{1}},\mathrm{\dots},{\text{v}}_{r})$. In other words, we have the following by construction:

$$\begin{array}{}\text{(16)}& \begin{array}{l}{\mathbf{V}}_{r}{\mathbf{V}}_{r}^{\prime}+{\mathbf{V}}_{r}^{\u27c2}{{\mathbf{V}}_{r}^{\u27c2}}^{\prime}=\mathbf{I},\\ {{\mathbf{V}}_{r}^{\u27c2}}^{\prime}{\mathbf{V}}_{r}^{\u27c2}=\mathbf{I}.\end{array}\end{array}$$

Note that infinitely many matrices ${\mathbf{V}}_{r}^{\u27c2}$ satisfy
Eq. (16), and that it is straightforward to construct one such matrix
from the *r* leading eigenvectors $({\mathit{v}}_{\mathrm{1}},\mathrm{\dots},{\text{v}}_{r})$, for
instance by using the Gram–Schmidt algorithm. Furthermore, it is also
important to note that, since ${\mathbf{V}}_{r}^{\u27c2}$ is orthogonal to
**V**_{r}, it spans the same subspace as the subspace spanned by the *n*−*r*
eigenvectors $({\text{v}}_{r+\mathrm{1}},\mathrm{\dots},{\mathit{v}}_{n})$ associated with the
smallest eigenvalues, because the latter are also orthogonal to
$({\mathit{v}}_{\mathrm{1}},\mathrm{\dots},{\text{v}}_{r})$ by definition. Therefore, it is not
necessary to have knowledge of these *n*−*r* eigenvectors
$({\text{v}}_{r+\mathrm{1}},\mathrm{\dots},{\mathit{v}}_{n})$ in order to be able to project the data
onto the latter subspace: only the knowledge of
$({\mathit{v}}_{\mathrm{1}},\mathrm{\dots},{\text{v}}_{r})$ is required for that purpose. Thus, with
the knowledge of the *r* leading eigenvectors only, we are able to project
the data onto the orthogonal subspace spanned by ${\mathbf{V}}_{r}^{\u27c2}$
using the transformation matrix ${\mathbf{T}}_{r}^{\u27c2}={\mathbf{V}}_{r}^{\u27c2}{{\mathbf{V}}_{r}^{\u27c2}}^{\prime}=\mathbf{I}-{\mathbf{V}}_{r}{\mathbf{V}}_{r}^{\prime}$, thus
obtaining $({\mathbf{T}}_{r}^{\u27c2}\mathit{y},{\mathbf{T}}_{r}^{\u27c2}\mathbf{x})$. The
estimator proposed in Eq. (15) consists of using the OLS
estimator corresponding to these transformed data:

$$\begin{array}{ll}{\displaystyle}{\widehat{\mathit{\beta}}}_{r}^{*}& {\displaystyle}=\left({\mathbf{x}}^{\prime}{\mathbf{V}}_{r}^{\u27c2}{{\mathbf{V}}_{r}^{\u27c2}}^{\prime}\mathbf{x}{)}^{-\mathrm{1}}\right({\mathbf{x}}^{\prime}{\mathbf{V}}_{r}^{\u27c2}{{\mathbf{V}}_{r}^{\u27c2}}^{\prime}\mathit{y})\\ \text{(17)}& {\displaystyle}& {\displaystyle}=({\mathbf{x}}^{\prime}\mathbf{x}-{\mathbf{x}}^{\prime}{\mathbf{V}}_{r}{\mathbf{V}}_{r}^{\prime}\mathbf{x}{)}^{-\mathrm{1}}({\mathbf{x}}^{\prime}\mathit{y}-{\mathbf{x}}^{\prime}{\mathbf{V}}_{r}{\mathbf{V}}_{r}^{\prime}\mathit{y}).\end{array}$$

It is interesting to note that the orthogonal projection subspace ${\mathbf{V}}_{r}^{\u27c2}$ does not actually need to be obtained to derive
${\widehat{\mathit{\beta}}}_{r}^{*}$ because only ${\mathbf{V}}_{r}^{\u27c2}{{\mathbf{V}}_{r}^{\u27c2}}^{\prime}$ matters for this purpose, and this matrix happens to be
directly known from **V**_{r} through the orthogonality Eq. (16).
This yields the right-most term of Eq. (17),
which corresponds to the expression of ${\widehat{\mathit{\beta}}}_{r}^{*}$ given above in
Eq. (15).

However, the choice of applying the OLS estimator to the projected
data as implied by Eq. (17), rather than applying the GLS
estimator, may seem surprising at first. The rationale for this choice is the
following. Since we only assume knowledge of the *r* leading eigenvectors,
the projection subspace ${\mathbf{V}}_{r}^{\u27c2}$ is thus by construction unknown
from the standpoint of the covariance of the projected noise. In other words,
while the projected noise ${\mathbf{T}}_{r}^{\u27c2}\mathit{\nu}$ in the projected
regression equation remains Gaussian centred, we do not know anything about
its covariance. Thus, we use by default the assumption that
${\mathbf{T}}_{r}^{\u27c2}\mathit{\nu}$ is IID with unknown variance *σ*^{2}, which is a
commonplace assumption used in regression analysis when nothing is known
about the noise's covariance. An immediate implication of this assumption is
that the associated log likelihood is the following:

$$\begin{array}{}\text{(18)}& -\mathrm{2}\mathrm{log}\phantom{\rule{0.125em}{0ex}}\mathrm{\ell}(\mathit{\beta},{\mathit{\sigma}}^{\mathrm{2}})=n\mathrm{log}{\mathit{\sigma}}^{\mathrm{2}}+{\displaystyle \frac{\mathrm{1}}{{\mathit{\sigma}}^{\mathrm{2}}}}(\mathit{y}-\mathbf{x}\mathit{\beta}{)}^{\prime}{\mathbf{V}}_{r}^{\u27c2}{{\mathbf{V}}_{r}^{\u27c2}}^{\prime}(\mathit{y}-\mathbf{x}\mathit{\beta}).\end{array}$$

The complete likelihood of Eq. (18) can be concentrated in
** β** by maximizing out

$$\begin{array}{ll}{\displaystyle}-\mathrm{2}\mathrm{log}\phantom{\rule{0.125em}{0ex}}{\mathrm{\ell}}_{c}\left(\mathit{\beta}\right)& {\displaystyle}=n\mathrm{log}\left\{(\mathit{y}-\mathbf{x}\mathit{\beta}{)}^{\prime}{\mathbf{V}}_{r}^{\u27c2}{{\mathbf{V}}_{r}^{\u27c2}}^{\prime}(\mathit{y}-\mathbf{x}\mathit{\beta})\right\}\\ \text{(19)}& {\displaystyle}& {\displaystyle}=n\mathrm{log}\left\{(\mathit{y}-\mathbf{x}\mathit{\beta}{)}^{\prime}(\mathbf{I}-{\mathbf{V}}_{r}{{\mathbf{V}}_{r}}^{\prime}\left)\right(\mathit{y}-\mathbf{x}\mathit{\beta})\right\}.\end{array}$$

Our proposed OLS estimator ${\widehat{\mathit{\beta}}}_{r}^{*}$ results from the maximization of the above likelihood. The above considerations also have some implications with respect to the uncertainty analysis on the proposed estimator ${\widehat{\mathit{\beta}}}_{r}^{*}$. Indeed, in the present context, it is straightforward to obtain confidence intervals around ${\widehat{\mathit{\beta}}}_{r}^{*}$ based on the ratio of likelihoods ${\mathrm{\ell}}_{c}\left(\mathit{\beta}\right)/{\mathrm{\ell}}_{c}\left({\widehat{\mathit{\beta}}}_{r}^{*}\right)$, following a classic approach in statistics that is often used in D&A (e.g. Hannart et al., 2014).

It is important to underline that the latter approach to uncertainty quantification relies on the assumption that the projected noise ${\mathbf{T}}_{r}^{\u27c2}\mathit{\nu}$ is IID. Since the unknown true covariance of ${\mathbf{T}}_{r}^{\u27c2}\mathit{\nu}$ is in fact equal to ${\mathbf{\Sigma}}_{r}^{\u27c2}={\mathbf{V}}_{r}^{\u27c2}{\mathbf{\Delta}}_{r}^{\u27c2}{{\mathbf{V}}_{r}^{\u27c2}}^{\prime}$, this assumption is incorrect and hence the resulting confidence intervals are prone to be incorrect as well. An initial assessment under test bed conditions described in Section 4 suggested that the proposed approach is conservative in the sense that it leads to an overestimation of uncertainty (not shown). However, a more detailed assessment of the performance of this uncertainty quantification procedure is needed, but is beyond the scope of the present paper.

The signals **x** are estimated as the empirical averages of
several ensembles of finite size *m*, which introduces sampling uncertainty.
In order to account for the latter, it is commonplace in D&A to reformulate
the linear regression model of Eq. (1) as an error-in-variables
(EIV) model (Allen and Stott, 2003):

$$\begin{array}{}\text{(20)}& \begin{array}{l}\mathit{y}={\mathbf{x}}^{*}\mathit{\beta}+\mathit{\nu},\\ \mathbf{x}={\mathbf{x}}^{*}+\mathit{\epsilon},\end{array}\end{array}$$

where **x**^{*} is now an unobserved latent variable,
**x** is the ensemble average estimator of the true responses
**x**^{*} and ** ε** is a Gaussian centred
noise term with covariance

It is straightforward to generalize the approach exposed in Sect. 2.3 to
this error-in-variables regression model. Likewise, the data
(** y**,

$$\begin{array}{ll}{\displaystyle}-\mathrm{2}\mathrm{log}\phantom{\rule{0.125em}{0ex}}\mathrm{\ell}& {\displaystyle}(\mathit{\beta},{\mathbf{x}}^{*},{\mathit{\sigma}}^{\mathrm{2}})=n(p+\mathrm{1})\mathrm{log}{\mathit{\sigma}}^{\mathrm{2}}+{\displaystyle \frac{\mathrm{1}}{{\mathit{\sigma}}^{\mathrm{2}}}}(\mathit{y}-{\mathbf{x}}^{*}\mathit{\beta}{)}^{\prime}{\mathbf{V}}_{r}^{\u27c2}{{\mathbf{V}}_{r}^{\u27c2}}^{\prime}\\ \text{(21)}& {\displaystyle}& {\displaystyle}(\mathit{y}-{\mathbf{x}}^{*}\mathit{\beta})+{\displaystyle \frac{m}{{\mathit{\sigma}}^{\mathrm{2}}}}\text{Tr}\left\{(\mathbf{x}-{\mathbf{x}}^{*}{)}^{\prime}{\mathbf{V}}_{r}^{\u27c2}{{\mathbf{V}}_{r}^{\u27c2}}^{\prime}(\mathbf{x}-{\mathbf{x}}^{*})\right\},\end{array}$$

which can be concentrated in ** β** by maximizing out

$$\begin{array}{ll}{\displaystyle}-\mathrm{2}& {\displaystyle}\mathrm{log}\phantom{\rule{0.125em}{0ex}}{\mathrm{\ell}}_{c}\left(\mathit{\beta}\right)=n(p+\mathrm{1})\mathrm{log}\\ \text{(22)}& {\displaystyle}& {\displaystyle}\left\{{\displaystyle \frac{(\mathit{y}-\mathbf{x}\mathit{\beta}{)}^{\prime}(\mathbf{I}-{\mathbf{V}}_{r}{{\mathbf{V}}_{r}}^{\prime}\left)\right(\mathit{y}-\mathbf{x}\mathit{\beta})}{\mathrm{1}+\frac{\mathrm{1}}{m}{\mathit{\beta}}^{\prime}\mathit{\beta}}}\right\}.\end{array}$$

The so-called total least squares (TLS) estimator results from the
maximization of the above concentrated likelihood in ** β**, and is
available under a closed form expression:

$$\begin{array}{}\text{(23)}& {\widehat{\mathit{\beta}}}_{r}^{*}=-{\displaystyle \frac{\sqrt{m}}{{z}_{p+\mathrm{1}}}}\phantom{\rule{0.125em}{0ex}}\mathit{z},\end{array}$$

where $\mathit{z}=({z}_{\mathrm{1}},\mathrm{\dots},{z}_{p}{)}^{\prime}$ is a *p*-dimensional vector and *z*_{p+1} is a
scalar, such that $({z}_{\mathrm{1}},\mathrm{\dots},{z}_{p+\mathrm{1}}{)}^{\prime}$ is the eigenvector associated with the
smallest eigenvalue of the positive definite matrix $\mathbf{S}=\left[\sqrt{m}\phantom{\rule{0.125em}{0ex}}\mathbf{x}\phantom{\rule{0.125em}{0ex}}\phantom{\rule{0.125em}{0ex}}\mathit{y}{]}^{\prime}\phantom{\rule{0.125em}{0ex}}\right(\mathbf{I}-{\mathbf{V}}_{r}{{\mathbf{V}}_{r}}^{\prime}\left)\phantom{\rule{0.125em}{0ex}}\right[\sqrt{m}\phantom{\rule{0.125em}{0ex}}\mathbf{x}\phantom{\rule{0.125em}{0ex}}\phantom{\rule{0.125em}{0ex}}\mathit{y}]$ of size *p*+1. The estimator of Eq. (23)
is a generalization of the OLS estimator of Eq. (17), the
latter is indeed a special case of the former for $m=+\mathrm{\infty}$. Furthermore,
in the particular case *p*=1 which is often referred to as the Deming
regression, the estimator of Eq. (23) takes the following
expression:

$$\begin{array}{}\text{(24)}& {\widehat{\mathit{\beta}}}_{r}^{*}={\displaystyle \frac{\sqrt{(m{S}_{xx}-{S}_{yy}{)}^{\mathrm{2}}+\mathrm{4}m{S}_{xy}^{\mathrm{2}}}-(m{S}_{xx}-{S}_{yy})}{\mathrm{2}{S}_{xy}}},\end{array}$$

where ${S}_{xx}={x}^{\prime}(\mathbf{I}-{\mathbf{V}}_{r}{{\mathbf{V}}_{r}}^{\prime})x$, ${S}_{yy}={\mathit{y}}^{\prime}(\mathbf{I}-{\mathbf{V}}_{r}{{\mathbf{V}}_{r}}^{\prime})\mathit{y}$ and
${S}_{xy}={x}^{\prime}(\mathbf{I}-{\mathbf{V}}_{r}{{\mathbf{V}}_{r}}^{\prime})\mathit{y}$. Next, following the same approach as in Sect. 2.3, it would be
tempting to derive confidence intervals around ${\widehat{\mathit{\beta}}}_{r}^{*}$ based on
the ratio of likelihoods ${\mathrm{\ell}}_{c}\left(\mathit{\beta}\right)/{\mathrm{\ell}}_{c}\left({\widehat{\mathit{\beta}}}_{r}^{*}\right)$. While
this approach performs reasonably well in the context of EIV regression with
known variance (Hannart et al., 2014), it is known to be problematic in the present
context of EIV regression with unknown variance (Casella and Berger, 2008; Sect. 12.2).
In this case, more suitable alternatives are available, e.g. an estimator of
the variance of ${\widehat{\mathit{\beta}}}_{r}^{*}$ for *p*=1 is given by Fuller (1987)
Sect. 1.3; it has the following expression:

$$\begin{array}{}\text{(25)}& \widehat{V}\left({\widehat{\mathit{\beta}}}_{r}^{*}\right)={\displaystyle \frac{\mathrm{1}}{n-r}}\left({\widehat{\mathit{\beta}}}_{r}^{*}{\displaystyle \frac{{S}_{xx}}{{S}_{xy}}}-\mathrm{1}\right)\left(m{\widehat{\mathit{\beta}}}_{r}^{*}{\displaystyle \frac{{S}_{xx}}{{S}_{xy}}}+{\widehat{\mathit{\beta}}}_{r}^{*\mathrm{2}}\right),\end{array}$$

from which confidence intervals can be obtained. When *p* > 1 the above
variance estimator is applied successively to each component *i* of
${\widehat{\mathit{\beta}}}_{r}^{*}$ ($i=\mathrm{1},\mathrm{\dots},p$) by replacing ** y** with $\mathit{y}-{\sum}_{k\ne i}{\widehat{\mathit{\beta}}}_{r,k}^{*}\phantom{\rule{0.125em}{0ex}}{x}_{k}$ in Eq. (25). It is worthwhile
noting that various approaches are available in the literature to tackle the
general problem of deriving confidence intervals for

Summarizing, the proposed solution here to deal with the case of error in
variables thus consists of using Eq. (23) to derive an estimate
of ** β** and Eq. (25) to derive a confidence interval.

The motivation of this subsection is merely to provide a more in-depth mathematical grounding to the proposed estimator, beyond the general considerations of Sect. 2.1. This subsection can safely be skipped without affecting the understanding of the remainder of this article, in so far as it does not provide any additional results or properties regarding the estimator itself.

The idea here is that, in addition to the general considerations of
Sect. 2.1, the proposed estimator ${\widehat{\mathit{\beta}}}_{r}^{*}$ can also be
justified by following a continuity extension argument. For any given
**x** and ** y**, let us consider the BLUE estimator of Eq. (2)
as a function of the known covariance

$$\begin{array}{}\text{(26)}& \begin{array}{l}f:{\mathcal{S}}_{n}^{+*}\left(\mathbb{R}\right)\to {\mathbb{R}}^{p},\\ f\left(\mathbf{A}\right)=\left({\mathbf{x}}^{\prime}{\mathbf{A}}^{-\mathrm{1}}\mathbf{x}{)}^{-\mathrm{1}}\right({\mathbf{x}}^{\prime}{\mathbf{A}}^{-\mathrm{1}}\mathit{y}),\end{array}\end{array}$$

so that we have $\widehat{\mathit{\beta}}=f\left(\mathbf{\Sigma}\right)$.

In the present context, we assumed that **Σ** is unknown but that
${\mathbf{\Sigma}}_{r}={\mathbf{V}}_{r}{\mathbf{\Delta}}_{r}{\mathbf{V}}_{r}^{\prime}$ is known. While
it is tempting to apply *f* in **Σ**_{r},
unfortunately **Σ**_{r} is not positive definite so that
*f* is not defined in **Σ**_{r}. Hence, an estimate of ** β** cannot be obtained by
straightforwardly applying the function

$$\begin{array}{}\text{(27)}& {\mathcal{S}}_{n}^{+}\left(\mathbb{R}\right)=\stackrel{\mathrm{\u203e}}{{\mathcal{S}}_{n}^{+*}\left(\mathbb{R}\right)},\end{array}$$

where for any subset *E*, the notation $\stackrel{\mathrm{\u203e}}{E}$ used above in Eq. (27)
denotes the *adherence* of *E*. Equation (27) thus implies that ${\mathcal{S}}_{n}^{+*}\left(\mathbb{R}\right)$ is dense
in ${\mathcal{S}}_{n}^{+}\left(\mathbb{R}\right)$, which means that for any positive
semi-definite matrix **A** there exists a sequence of positive definite
matrices converging to **A**. Equation (27) therefore
suggests the possibility to define a function $\stackrel{\mathrm{\u203e}}{f}$ which
continuously extends *f* to the set of positive semi-definite matrices. For
this purpose, let us consider the matrix **A**+*ε***I**
with *ε*>0. It is immediate to show that for any $\mathbf{A}\in {\mathcal{S}}_{n}^{+}\left(\mathbb{R}\right)$
and for any arbitrarily small but positive
*ε*, we have $\mathbf{A}+\mathit{\epsilon}\mathbf{I}\in {\mathcal{S}}_{n}^{+*}\left(\mathbb{R}\right)$. Taking advantage of this remark, we define
$\stackrel{\mathrm{\u203e}}{f}$ by $\stackrel{\mathrm{\u203e}}{f}\left(\mathbf{A}\right)={lim}_{\mathit{\epsilon}\to {\mathrm{0}}^{+}}f(\mathbf{A}+\mathit{\epsilon}\mathbf{I})$. With this definition, $\stackrel{\mathrm{\u203e}}{f}$ can
be obtained under a closed form expression:

$$\begin{array}{}\text{(28)}& \begin{array}{l}\stackrel{\mathrm{\u203e}}{f}:{\mathcal{S}}_{n}^{+}\left(\mathbb{R}\right)\to {\mathbb{R}}^{p},\\ \stackrel{\mathrm{\u203e}}{f}\left(\mathbf{A}\right)=({\mathbf{x}}^{\prime}\mathbf{x}-{\mathbf{x}}^{\prime}{\mathbf{V}}_{\mathrm{A}}{\mathbf{V}}_{\mathrm{A}}^{\prime}\mathbf{x}{)}^{-\mathrm{1}}({\mathbf{x}}^{\prime}\mathit{y}-{\mathbf{x}}^{\prime}{\mathbf{V}}_{\mathrm{A}}{\mathbf{V}}_{\mathrm{A}}^{\prime}\mathit{y}).\end{array}\end{array}$$

The proof of Eq. (28) is detailed in Appendix A. The
proposed estimator ${\widehat{\mathit{\beta}}}_{r}^{*}$ of Eq. (15) is
therefore obtained by applying the extended function $\stackrel{\mathrm{\u203e}}{f}$ defined
above, to the known, positive semi-definite matrix **Σ**_{r}:

$$\begin{array}{}\text{(29)}& {\widehat{\mathit{\beta}}}_{r}^{*}=\stackrel{\mathrm{\u203e}}{f}\left({\mathbf{\Sigma}}_{r}\right)=({\mathbf{x}}^{\prime}\mathbf{x}-{\mathbf{x}}^{\prime}{\mathbf{V}}_{r}{\mathbf{V}}_{r}^{\prime}\mathbf{x}{)}^{-\mathrm{1}}({\mathbf{x}}^{\prime}\mathit{y}-{\mathbf{x}}^{\prime}{\mathbf{V}}_{r}{\mathbf{V}}_{r}^{\prime}\mathit{y}).\end{array}$$

4 Simulations and illustration

Back to toptop
We illustrate the method by applying it to surface temperature data and
climate model simulations over the 1901–2000 period and over the entire
surface of the Earth. For this purpose, we use the data described and studied
in Ribes and Terray (2012). The temperature observations are based on the HadCRUT3 merged
land–sea temperature data set (Brohan et al., 2006). The simulated temperature data
used to estimate the response patterns are provided by results from the CMIP5
archive arising from simulations performed with the CNRM-CM5 model of
Météo France under NAT and ANT experiments. For both experiments, the
size of the ensembles is *m*=6. However, estimates of internal
variability are based on intra-ensemble variability from 10 CMIP5 models
(CNRM-CM5, CanESM2, HadGEM2-ES, GISS-E2-R, GISS-E2-H, CSIRO-Mk3-6-0,
IPSL-CM5A-LR, BCC-CSM1-1, NorESM1-M, FGOALS-g2) under NAT and ANT experiments
as well as pre-industrial simulations from both the CMIP3 and CMIP5 archives,
leading to a large ensemble of 374 members. Standard pre-processing is
applied to all observed and simulated data. Data are first interpolated on
the 5^{∘} × 5^{∘} observational grid and then the spatio-temporal
observational mask is applied. The dimension of the data set is then further
reduced by computing decadal means and projecting the resulting spatial
patterns onto spherical harmonics with resolution T4 to obtain vectors of
size *n*=275 (see Ribes et al., 2012 for more details about ensembles and
pre-processing). The ANT and NAT individual responses (*p*=2) are estimated by
averaging the corresponding two ensembles to obtain the matrix of regressors
**x**. The covariance matrix **Σ** is estimated as the
empirical covariance of the large ensemble of control runs.

This section evaluates the performance of the estimators described above, by
applying it to simulated values of ** y** obtained from the linear regression
equation $\mathit{y}=\mathbf{x}\mathit{\beta}+\mathit{\nu}$ assumed by the model of Eq. (1),
where the noise

The use of simulated rather than real data aims at verifying that our
proposed estimator performs correctly, and at comparing its performance with
the conventional procedure, a goal which requires the actual values of
** β** to be known. A sample of

Figure 2 shows scatter plots of one realization of the data
(**x**,** y**) under this test bed, for the raw data
(

The conventional estimator ${\widehat{\mathit{\beta}}}_{r}$ and the proposed estimator
${\widehat{\mathit{\beta}}}_{r}^{*}$ of Eq. (15) were both derived based on
the *r* leading estimators estimated from the *N* control runs. Then, these
two estimators were derived again, this time using the true values of the *r*
leading eigenvectors, for comparison. Finally, the GLS estimator of Equation
(1) was derived based on the true value of the covariance
**Σ**. The performance of each five estimators of ** β** thus obtained was
evaluated based on average mean squared error (MSE)
${\sum}_{j=\mathrm{1}}^{J}({\widehat{\mathit{\beta}}}_{j}-\mathit{\beta}{)}^{\prime}({\widehat{\mathit{\beta}}}_{j}-\mathit{\beta})/N$, where
$j=\mathrm{1},\mathrm{\dots},J$ denotes the simulation number and

When comparing the performance of ${\widehat{\mathit{\beta}}}_{r}^{*}$ using estimated vs. actual
values of the *r* leading eigenvectors, a slight benefit of using
actual values is found. This benefit increases with *r*, as eigenvectors with
lower eigenvalues tend to be more difficult to estimate. In contrast, when
comparing the performance of ${\widehat{\mathit{\beta}}}_{r}$ using estimated vs. actual
values of the *r* leading eigenvectors, a slight benefit of using the
estimated values is found. This superiority of the estimated, and thus
incorrect, values may seem surprising at first, but is in fact natural and in
line with our main point. Indeed, it can be explained by the fact that the
estimated leading eigenvectors ${\widehat{\mathbf{V}}}_{r}$ match to a large
extent with **V**_{r} but also span to some extent the true orthogonal
subspace ${\mathbf{V}}_{r}^{\u27c2}$, which, according to our main point here, is
much more suited for estimating ** β**. Unsurprisingly, the estimator based
on the estimated leading eigenvectors ${\widehat{\mathbf{V}}}_{r}$ thus performs
better than its counterpart based on actual values.

The method is finally illustrated by applying it to actual observations of
surface temperature ** y**, with the same regressors

5 Conclusions

Back to toptop
We have introduced a new estimator of the vector ** β** of linear regression
coefficients, adapted to the context where only the

When applied on a simulation test bed that is realistic with respect to D&A applications,
we find that the proposed estimator outperforms the conventional estimator by several
orders of magnitude for small values of *r*. Our proposal is thus prone to significantly
affect D&A results by decreasing the estimated level of uncertainty.

Substantial further work is needed to evaluate the performance of the
proposed uncertainty quantification, in particular in an EIV context. Such an
evaluation was beyond the scope of the present work. Its primary focus was to
demonstrate that the choice of the leading eigenvectors as a projection
subspace is a vastly suboptimal one for D&A purposes; and that projecting on
a subspace which is orthogonal to the *r* leading eigenvectors yields a
significant improvement.

Data availability

Back to toptop
Data availability.

The data used for illustration throughout this article was first described and studied by Ribes and Terray (2013). As mentioned in the latter study, it can be accessed at the following url: http://www.cnrm-game.fr/spip.php?article23&lang=en (last access: 19 December 2018).

Let $\mathbf{A}\in {\mathcal{S}}^{+}\left(\mathbb{R}\right)$
be a non-invertible positive matrix. Denote **Δ**_{A}
the matrix of non-zero eigenvalues of **A**, **V**_{A}
the matrix of eigenvectors associated with **Δ**_{A} and
${\mathbf{V}}_{\mathrm{A}}^{\u27c2}$ the matrix of eigenvectors associated with
the eigenvalue zero. For *ε*>0, we have the following:

$$\begin{array}{}\text{(A1)}& \begin{array}{ll}(\mathbf{A}+\mathit{\epsilon}\mathbf{I}{)}^{-\mathrm{1}}& =({\mathbf{V}}_{\mathrm{A}}{\mathbf{\Delta}}_{\mathrm{A}}{\mathbf{V}}_{\mathrm{A}}^{\prime}+\mathit{\epsilon}\mathbf{I}{)}^{-\mathrm{1}}\\ & =({\mathbf{V}}_{\mathrm{A}}{\mathbf{\Delta}}_{\mathrm{A}}{\mathbf{V}}_{\mathrm{A}}^{\prime}+\mathit{\epsilon}{\mathbf{V}}_{\mathrm{A}}^{\u27c2}{{\mathbf{V}}_{\mathrm{A}}^{\u27c2}}^{\prime}+\mathit{\epsilon}{\mathbf{V}}_{\mathrm{A}}{\mathbf{V}}_{\mathrm{A}}^{\prime}{)}^{-\mathrm{1}}\\ & ={\left({\mathbf{V}}_{\mathrm{A}}({\mathbf{\Delta}}_{\mathrm{A}}+\mathit{\epsilon}\mathbf{I}){\mathbf{V}}_{\mathrm{A}}^{\prime}+\mathit{\epsilon}{\mathbf{V}}_{\mathrm{A}}^{\u27c2}{{\mathbf{V}}_{\mathrm{A}}^{\u27c2}}^{\prime}\right)}^{-\mathrm{1}}\\ & ={\mathbf{V}}_{\mathrm{A}}({\mathbf{\Delta}}_{\mathrm{A}}+\mathit{\epsilon}\mathbf{I}{)}^{-\mathrm{1}}{\mathbf{V}}_{\mathrm{A}}^{\prime}+\frac{\mathrm{1}}{\mathit{\epsilon}}{\mathbf{V}}_{\mathrm{A}}^{\u27c2}{{\mathbf{V}}_{\mathrm{A}}^{\u27c2}}^{\prime}.\end{array}\end{array}$$

Therefore,

$$\begin{array}{ll}{\displaystyle}f(\mathbf{A}+\mathit{\epsilon}\mathbf{I})& {\displaystyle}={\left({\mathbf{x}}^{\prime}{\mathbf{V}}_{\mathrm{A}}({\mathbf{\Delta}}_{\mathrm{A}}+\mathit{\epsilon}\mathbf{I}{)}^{-\mathrm{1}}{\mathbf{V}}_{\mathrm{A}}^{\prime}\mathbf{x}+{\displaystyle \frac{\mathrm{1}}{\mathit{\epsilon}}}{\mathbf{x}}^{\prime}{\mathbf{V}}_{\mathrm{A}}^{\u27c2}{{\mathbf{V}}_{\mathrm{A}}^{\u27c2}}^{\prime}\mathbf{x}\right)}^{-\mathrm{1}}\\ {\displaystyle}& {\displaystyle}\left({\mathbf{x}}^{\prime}{\mathbf{V}}_{\mathrm{A}}({\mathbf{\Delta}}_{\mathrm{A}}+\mathit{\epsilon}\mathbf{I}{)}^{-\mathrm{1}}{\mathbf{V}}_{\mathrm{A}}^{\prime}\mathit{y}+{\displaystyle \frac{\mathrm{1}}{\mathit{\epsilon}}}{\mathbf{x}}^{\prime}{\mathbf{V}}_{\mathrm{A}}^{\u27c2}{{\mathbf{V}}_{\mathrm{A}}^{\u27c2}}^{\prime}\mathit{y}\right)\\ {\displaystyle}& {\displaystyle}={\left({\mathbf{x}}^{\prime}{\mathbf{V}}_{\mathrm{A}}{\mathbf{B}}_{\mathit{\epsilon}}{\mathbf{V}}_{\mathrm{A}}^{\prime}\mathbf{x}+{\mathbf{x}}^{\prime}{\mathbf{V}}_{\mathrm{A}}^{\u27c2}{{\mathbf{V}}_{\mathrm{A}}^{\u27c2}}^{\prime}\mathbf{x}\right)}^{-\mathrm{1}}\\ \text{(A2)}& {\displaystyle}& {\displaystyle}\left({\mathbf{x}}^{\prime}{\mathbf{V}}_{\mathrm{A}}{\mathbf{B}}_{\mathit{\epsilon}}{\mathbf{V}}_{\mathrm{A}}^{\prime}\mathit{y}+{\mathbf{x}}^{\prime}{\mathbf{V}}_{\mathrm{A}}^{\u27c2}{{\mathbf{V}}_{\mathrm{A}}^{\u27c2}}^{\prime}\mathit{y}\right),\end{array}$$

with ${\mathbf{B}}_{\mathit{\epsilon}}=\mathit{\epsilon}({\mathbf{\Delta}}_{\mathrm{A}}+\mathit{\epsilon}\mathbf{I}{)}^{-\mathrm{1}}$. We have ${lim}_{\mathit{\epsilon}\to {\mathrm{0}}^{+}}\phantom{\rule{0.125em}{0ex}}\phantom{\rule{0.125em}{0ex}}{\mathbf{B}}_{\mathit{\epsilon}}=\mathrm{0}$, hence,

$$\begin{array}{}\text{(A3)}& \begin{array}{ll}{lim}_{\mathit{\epsilon}\to {\mathrm{0}}^{+}}& f(\mathbf{A}+\mathit{\epsilon}\mathbf{I})={\left({\mathbf{x}}^{\prime}{\mathbf{V}}_{\mathbf{A}}^{\u27c2}{{\mathbf{V}}_{\mathbf{A}}^{\u27c2}}^{\prime}\mathbf{x}\right)}^{-\mathrm{1}}\left({\mathbf{x}}^{\prime}{\mathbf{V}}_{\mathrm{A}}^{\u27c2}{{\mathbf{V}}_{\mathrm{A}}^{\u27c2}}^{\prime}\mathit{y}\right)\\ & =({\mathbf{x}}^{\prime}\mathbf{x}-{\mathbf{x}}^{\prime}{\mathbf{V}}_{\mathrm{A}}{\mathbf{V}}_{\mathrm{A}}^{\prime}\mathbf{x}{)}^{-\mathrm{1}}({\mathbf{x}}^{\prime}\mathit{y}-{\mathbf{x}}^{\prime}{\mathbf{V}}_{\mathrm{A}}{\mathbf{V}}_{\mathrm{A}}^{\prime}\mathit{y}),\end{array}\end{array}$$

which proves Eq. (28).

Competing interests

Back to toptop
Competing interests.

The author declares that there is no conflict of interest.

Acknowledgements

Back to toptop
Acknowledgements.

This work was supported by the ANR grant DADA and by the LEFE grant MESDA. I
thank Dáithí Stone, Philippe Naveau and two anonymous reviewers for very
useful comments and discussions that helped improve considerably this
article.

Edited by: Christopher Paciorek

Reviewed by: two anonymous referees

References

Back to toptop
Allen, M. and Stott, P.: Estimating signal amplitudes in optimal fingerprinting, part i: theory, Clim. Dynam., 21, 477–491, 2003. a

Allen, M. and Tett, S.: Checking for model consistency in optimal fingerprinting, Clim. Dynam., 15, 419–434, 1999. a

Amemiya, T.: Advanced Econometrics, Harvard University Press, 1985. a

Bell, T. P.: Theory of optimal weighting of data to detect climate change, J. Atmos. Sci., 43, 1694–1710, 1986. a

Brohan, P., Kennedy, J., Harris, I., Tett, S., and Jones, P.: Uncertainty estimates in regional and global observed temperature changes: a new data set from 1850, J. Geophys. Res., 111, D12106, https://doi.org/10.1029/2005JD006548, 2006. a, b

Casella, G. and Berger, R. L.: Statistical Inference, Duxbury, 2nd Edition, 2008. a

Fuller, W. A.: Measurement Error Models, John Wiley, New York, 1987. a

Hannart, A.: Integrated Optimal Fingerprinting: Method description and illustration, J. Climate, 29, 1977–1998, 2016.

Hannart, A., Ribes, A., and Naveau, P.: Optimal fingerprinting under multiple sources of uncertainty, Geophys. Res. Lett., 41, 1261–1268, https://doi.org/10.1002/2013GL058653, 2014. a, b

Hasselmann, K.: On the signal-to-noise problem in atmospheric response studies, in: Meteorology of tropical oceans, edited by: Shaw, D. B., 251–259, Royal Meteorological Society, London, 1979. a

Hegerl, G. C. and North, G. R.: Comparison of statistically optimal methods for detecting anthropogenic climate change, J. Climate, 10, 1125–1133, 1997.

Hegerl, G. and Zwiers, F.: Use of models in detection and attribution of climate change, WIREs Clim Change, 2, 570–591, https://doi.org/10.1002/wcc.121, 2011. a, b

Hegerl, G. C., von Storch, H., Hasselmann, K., Santer, B. D., Cubasch, U., and Jones, P. D.: Detecting greenhouse gas induced climate change with an optimal fingerprint method, J. Climate, 9, 2281–2306, 1996. a, b

Hegerl, G. C., Zwiers, F. W., Braconnot, P., Gillett, N. P., Luo, Y., Marengo Orsini, J. A., Nicholls, N., Penner, J. E., and Stott, P. A.: Understanding and Attributing Climate Change, in: Climate Change 2007: The Physical Science Basis. Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change, edited by: Solomon, S., Qin, D., Manning, M., Chen, Z., Marquis, M., Averyt, K. B., Tignor, M., and Miller, H. L., Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA, 2007. a

Lott, F. C., Stott, P. A., Mitchell, D. M., Christidis, N., Gillett, N. P., Haimberger, L., Perlwitz, J., and Thorne, P. W.: Models versus radiosondes in the free atmosphere: A new detection and attribution analysis of temperature, J. Geophys. Res.-Atmos., 118, 2609–2619, https://doi.org/10.1002/jgrd.50255, 2013.

Min, S., Zhang, X., and Zwiers, F. W.: Human-Induced Arctic Moistening, Science, 320, 518–520, 2008.

Naderahmadian, Y., Tinati, M. A., and Beheshti, S.: Generalized adaptive weighted recursive least squares dictionary learning, Signal Process., 118, 89–96, 2015. a

North, G. R. and Stevens, M. J.: Detecting climate signals in the surface temperature record, J. Climate, 11, 563–577, 1998. a

North, G. R., Bell, T. L., Cahalan, R. F., and Loeng, F. J.: Sampling errors in the estimation of empirical orthogonal functions, Mon. Weather Rev., 110, 600–706, 1982. a, b

Poloni, F. and Sbrana, G.: Feasible generalized least squares estimation of multivariate GARCH(1,1) models, J. Multivar. Anal., 129, 151–159, 2014. a

Ribes, A. and Terray, L.: Regularised Optimal Fingerprint for attribution. Part II: application to global near-surface temperature based on CMIP5 simulations, Clim. Dynam., 41, 2837–2853, https://doi.org/10.1007/s00382-013-1736-6, 2012. a, b

Ribes, A., Planton, S., and Terray, L.: Regularised Optimal Fingerprint for attribution. Part I: method, properties and idealised analysis, Clim. Dynam., 41, 2817–2836, https://doi.org/10.1007/s00382-013-1735-7, 2012. a

Zhang, X., Zwiers, F. W., Hegerl, G. C., Lambert, F. H., Gillett, N. P., Solomon, S., Stott, P. A., and Nozawa, T.: Detection of human influence on twentieth-century precipitation trends, Nature, 448, 461–465, 2007.

Short summary

In climate change attribution studies, one often seeks to maximize a signal-to-noise ratio, where the

signalis the anthropogenic response and the

noiseis climate variability. A solution commonly used in D&A studies thus far consists of projecting the signal on the subspace spanned by the leading eigenvectors of climate variability. Here I show that this approach is vastly suboptimal – in fact, it leads instead to maximizing the noise-to-signal ratio. I then describe an improved solution.

In climate change attribution studies, one often seeks to maximize a signal-to-noise ratio,...

Advances in Statistical Climatology, Meteorology and Oceanography

An international open-access journal on applied statistics