An improved projection of climate observations for detection and attribution

An important goal of climate research is to determine the causal contribution of human activity to observed changes in the climate system. Methodologically speaking, most climatic causal studies to date have been formulating attribution as a linear regression inference problem. Under this formulation, the inference is often obtained by using the generalized least squares (GLS) estimator after projecting the data on the r leading eigenvectors of the covariance associated with internal variability, which are evaluated from numerical climate models. In this paper, we revisit the problem of obtaining a GLS estimator adapted to this particular situation, in which only the leading eigenvectors of the noise’s covariance are assumed to be known. After noting that the eigenvectors associated with the lowest eigenvalues are in general more valuable for inference purposes, we introduce an alternative estimator. Our proposed estimator is shown to outperform the conventional estimator, when using a simulation test bed that represents the 20th century temperature evolution.


Context
An important goal of climate research is to determine the causes of past global warming in general and the responsibility of human activity in particular ; this question has thus emerged as a research topic known as detection and attribution (D&A). From a methodological standpoint, D&A studies are usually based on linear regression methods, often referred to in this particular context as optimal fingerprinting, whereby an observed climate change is regarded as a linear combination of several externally forced signals added to internal climate variability (Hasselmann, 1979;Bell, 1986;North et al., 1982;Allen and Tett, 1999;Hegerl and Zwiers, 2011). On the one hand, the latter signals consist of spatial, temporal or space-time patterns of response to external forcings as anticipated by one or several climate models. Internal climate variability, on the other hand, is usually represented by a centred multivariate Gaussian noise. Its covariance , which is also estimated from model simulations, thus describes the detailed space-time features of internal variability. In particular, its eigenvectors can often be interpreted as the modes of variability, whose relative magnitudes are reflected by the associated eigenvalues.
Denoting y the n-dimensional vector of observed climate change, x = (x 1 , . . ., x p ) the n×p matrix concatenating the p externally forced signals and ν the internal climate variability noise, the regression equation is as follows: The results of the inference on the vector of regression coefficients β, and the magnitude of its confidence intervals, determine whether the external signals "are present in the observations" (Hegerl and Zwiers, 2011) and whether or not the observed change is attributable to each forcing.
Under the assumption that is known, the above inference problem can be solved using the standard generalized least squares (GLS) setting (e.g. Amemiya, 1985). Under this framework, the expression of the best linear unbiased estimator (BLUE) and its variance are given by the following: β = (x −1 x) −1 (x −1 y), Var( β) = (x −1 x) −1 .
(2) 162 A. Hannart: Improved projection for detection and attribution formed regression equation: where the n×n transformation matrix T is such that the components of the transformed noise Tν are independent and identically distributed (IID). This condition yields T T = I and an admissible choice for T is thus as follows: where = diag(λ 1 , . . ., λ n ) with λ 1 ≥ . . . ≥ λ n > 0 is the matrix of eigenvalues of and V = (v 1 , . . ., v n ) is its matrix of column eigenvectors, i.e. = V V . In Eq. (3), the data are thus projected on the eigenvectors of the covariance matrix scaled by the inverse square root of their eigenvalues.
In most applications, the covariance is not known and Eq. (2) cannot be implemented. This has given rise to many theoretical developments (e.g. Poloni and Sbrana, 2014;Naderahmadian et al., 2015) which often rely on further assumptions regarding the covariance , in order to allow for its estimation simultaneously with β. In the applicative context of D&A considered here, the situation may be viewed as an in-between between known and unknown because we have access to an ensemble of unforced climate model simulations. This ensemble can be used to extract valuable information about . However, the resulting knowledge of is imperfect for two reasons. First, the size of the ensemble is often very small, typically a few dozens, because of the high cost of numerical experiments requiring intensive computational resources. For the range of dimensions n usually considered, typically a few hundreds or thousands, the empirical covariance S may thus be non-invertible. Second, climate model simulations are known to inadequately represent the modes of internal variability at the smaller scale. Those smaller scale modes have lower magnitude and are expected to be associated with eigenvectors with low eigenvalues. Furthermore, the latter eigenvectors with low eigenvalues are also poorly sampled by construction and therefore are difficult to estimate reliably (North et al., 1982). For these two reasons, it is often assumed that only a small number r of leading eigenvectors of , which are denoted (v 1 , . . ., v r ) and r is typically on the order of 10, can be estimated from the control runs with enough reliability. Accordingly, it is assumed that the remaining n − r eigenvectors and eigenvalues are unknown.
Under this assumption, the inference on β must therefore be obtained based on these r leading eigenvectors only. The conventional approach presently used in D&A studies to tackle this problem follows from a simple and straightforward idea: adapting Eq. (4) to this situation by restricting the image of the transformation matrix T to the subspace generated by the r leading eigenvectors, thus leading to a truncated transformation having the following expression: where r = diag(λ 1 , . . ., λ r ) and V r = (v 1 , . . ., v r ). A new estimator thus follows by similarly applying the OLS estimator to the transformed data (T r x, T r y): where −1 r = V r −1 r V r is the pseudo-inverse of V r r V r . Finally, from a mere terminology standpoint, applying the truncated transformation T r to the data is often referred to in D&A as "projecting" the data onto the subspace spanned by the leading eigenvectors. However, T r is in fact not a projection from a mathematical standpoint insofar as T 2 r = T r hence we prefer to use the word "transformation".

Objectives
The solution recalled above in Eq. (6) is easy to use and may seem to be a natural choice. It emerged as a popular approach in D&A arguably for these reasons. Nevertheless, this approach has two important limitations. Firstly, as pointed out by Hegerl et al. (1996), the leading eigenvectors sometimes fail to capture the signals x. Indeed, the leading eigenvectors capture variability well, but not necessarily the signal, merely because the corresponding physical processes may differ and be associated with different patterns. Secondly, by construction, the subspace spanned by the leading eigenvectors maximizes the amplitude of the noise, thereby minimizing the ratio of signal to noise and affecting the performance of the inference on β.
The first objective of this article is to highlight in detail these two limitations: both will be formalized and discussed immediately below in Sect. 2. Its second objective is to circumvent these limitations by building a new estimator of β that performs better than the conventional estimator β r of Eq. (6) while still making the same assumptions and using the same data. Most importantly, the central assumption here is that only the r leading eigenvectors of can be estimated with enough accuracy from the control runs. Furthermore, we assume that r is given and is small compared to n (e.g. r is on the order of 10), and that N control runs are available to estimate the r leading eigenvectors with N ≥ r.
Section 2 highlights the limitations of the conventional estimator β r . Section 3 introduces our proposed estimator β * r . Section 4 evaluates the benefit of our approach based on a realistic test bed, and illustrates it on actual observations of surface temperature. Section 5 concludes the paper.

General considerations
It is useful to start this discussion by returning for a moment to the case where is assumed to be fully known, i.e. its n eigenvectors are all known perfectly, as well as their eigenvalues. Under this assumption, let us evaluate the benefit of a given eigenvector v k , k = 1, . . ., n, for the inference of β. For this purpose, we propose to compare the variance Var( β) = (x −1 x) −1 of the BLUE estimator of Eq. (2) obtained by using the transformation matrix T = V − 1 2 V , to the variance of the estimator β −k obtained by using the trans- where V −k and −k exclude the kth eigenvector and the kth eigenvalue respectively. From Eq. (2), these variances can be rewritten after some algebra: The Sherman-Morrison-Woodbury formula is a well known formula in linear algebra, which shows that the inverse of a rank-one correction of some matrix, can be conveniently computed by doing a rank one correction to the inverse of the original matrix. By applying the Sherman-Morrison-Woodbury formula to Eq. (7), we obtain the following: Unsurprisingly, an immediate implication of Eq. (8) is that removing any eigenvector v k results in an increase of the variance of the estimator. Indeed, the matrix in the second term of the right-hand side of Eq. (8) is positive definite; it can thus be interpreted as the cost in variance increase of excluding v k or conversely as the benefit in variance reduction of including v k . Moreover, the most valuable eigenvectors for inferring β in the above-defined sense are those combining a small eigenvalue (low λ k ) and a good alignment with the signals x (high x v k ). Accordingly, for any eigenvector v k , its benefit is a decreasing function of λ k , everything else being constant.
A similar conclusion can be reached by using a different approach. Consider this time the transformation T * = x −1 . Since T * is a p × n matrix, the data dimension is thus drastically reduced from n to p in the transformed Eq. (3). Furthermore, noting that the noise components T * ν have covariance T * T * , and applying the GLS estimator to the transformed data (T * y, T * x), we obtain an estimator β * which has the following expression: After some algebra, the above expression of β * simplifies to the following: which is exactly the expression of the estimator of Eq.
(2). Therefore, remarkably, the BLUE estimator obtained from the full data (y, x) is entirely recovered from the transformed data (T * y, T * x) when using the subspace spanned by T * . The latter subspace can thus be considered as optimal, in the sense that it reduces the data dimension to its bare minimum p while still preserving all the available information for inferring β. The p directions of the subspace spanned by T * are denoted (u * 1 , . . ., u * p ). They have the following expression: Consistently with Eq. (8), Eq. (11) shows that for each k = 1, . . ., p, the optimal direction u * k emphasizes the eigenvectors that best trade off the smallest possible eigenvalue (i.e. small noise) with the best possible alignment between the eigenvector and the signal x k (i.e. strong signal). It is interesting to notice that the foremost importance of small eigenvalues, which is underlined here, is also quite obvious in the formulation of optimal detection proposed by North and Stevens (1998). However, the importance of small eigenvalues was lost later on, as truncation on leading eigenvectors became a popular practice.
In light of these considerations, the two main limitations of the conventional estimator β r of Eq. (6), mentioned in Sect. 1, appear more clearly. Firstly, in agreement with Hegerl et al. (1996), there is no guarantee that the leading eigenvectors will align well with the signals at stake. Secondly, the leading eigenvectors are by definition those with the largest eigenvalues. Therefore, a straightforward truncation onto the leading eigenvectors, such as the one used in β r , maximizes the noise instead of minimizing it. While the first limitation may be an important pitfall in some cases, it will not be an issue whenever the signals align well with the leading eigenvectors, which nothing prevents in general. In contrast, the second limitation always manifests whenever β r is used and is thus arguably a more serious problem of the conventional approach. In the present paper, we thus choose to focus restrictively on this second limitation, and to not address the first one.

Illustration
In order to illustrate the second issue more concretely, we use surface temperature data from , described in more detail in Section 4. The signals x considered here consist of the responses to anthropogenic and natural forcings, with p = 2 and n = 275. Both x and the covariance are obtained from climate model simulations. The r leading eigenvectors V r = (v 1 , . . ., v r ) are obtained from and are assumed to be known. The proportion of the noise's total variance retained by projecting the data on V r is denoted σ 2 n (r) while the proportion of the signal's total variance associated with the same projection is denoted σ 2 s (r). Since the projection matrix associated with projecting on the subspace V r is equal to V r V r , we have the following: where for any vector z, Var(z) denotes the empirical variance of z. Figure 1a plots σ 2 n (r) and σ 2 s (r) computed from Eq. (12) applied to the above-mentioned data. For simplicity, σ 2 s (r) is shown only for the anthropogenic signal x 1 . As expected from the expression of σ 2 n (r) given in Eq. (12), it can be seen from this plot that σ 2 n (r) increases with r much faster than r/n, since by definition the leading eigenvectors concentrate most of the noise's variance. In contrast, σ 2 s (r) increases with r roughly like r/n: this shows that, in this example, the signal projects nearly uniformly on each eigenvector. For instance, when r = 20, V r involves 7 % of the n eigenvectors, and accordingly, projecting on V r retains 8 % of the signal's total variance. In contrast, as much as 55 % of the noise's total variance is retained by projecting on V r . Consequently, the signal-to-noise ratio of the projected data (see Fig. 1b) is reduced by a factor of 7 compared to the signal-to-noise ratio of the full data. Hence, it is fair to say that, in this case, the projection on the leading eigenvectors is greatly suboptimal, insofar as it drastically reduces the signal-to-noise ratio.
Let us now consider the quantities σ 2 * n (r) and σ 2 * n (r) defined as above, but projecting this time on the subspace or-thogonal to V r . By construction, As a direct consequence, for r = 20, the projection on the subspace orthogonal to V r captures 92 % = 100 %-8 % of the total variance of the signal, while the variance of the projected noise is only 20 % = 100 %-80 % of the total. Therefore, the signal-to-noise ratio of the projected data (see Fig. 1b) is this time magnified by a factor of 2 compared to the signal-to-noise ratio of the full data, and by a factor of 14 when comparing to the signal-to-noise ratio of the data projected on V r . It can thus be seen that projecting the data on the orthogonal to V r greatly magnifies the signal-to-noise ratio by filtering out a large fraction of the noise, while retaining most of the signal. These characteristics arguably make it a much more relevant projection subspace for the purpose of estimating β than the subspace of the leading eigenvectors. We further elaborate on this idea in Sect. 3.

Description
With these preliminary considerations in hand, we return to the situation of direct interest here: inferring β when only the r leading eigenvectors are available. In an attempt to improve the conventional estimator β r , we propose the following new estimator: Like the conventional estimator β r , the proposed estimator β * r only uses the data (y, x) and the r leading eigenvectors V r = (v 1 , . . ., v r ). Its formulation stems from the above considerations regarding the suboptimality of the subspace T r of Eq. (5) spanned by the leading eigenvectors, as we now highlight.
Let us denote V ⊥ r a n × (n − r) matrix consisting of n − r orthonormal column vectors spanning the subspace orthogonal to (v 1 , . . ., v r ). In other words, we have the following by construction: Note that infinitely many matrices V ⊥ r satisfy Eq. (16), and that it is straightforward to construct one such matrix from the r leading eigenvectors (v 1 , . . ., v r ), for instance by using the Gram-Schmidt algorithm. Furthermore, it is also important to note that, since V ⊥ r is orthogonal to V r , it spans the same subspace as the subspace spanned by the n − r eigenvectors (v r+1 , . . ., v n ) associated with the smallest eigenvalues, because the latter are also orthogonal to (v 1 , . . ., v r ) by definition. Therefore, it is not necessary to have knowledge of these n − r eigenvectors (v r+1 , . . ., v n ) in order to be able to project the data onto the latter subspace: only the knowledge of (v 1 , . . ., v r ) is required for that purpose. Thus, with the knowledge of the r leading eigenvectors only, we are able to project the data onto the orthogonal subspace spanned by V ⊥ r using the transformation matrix . The estimator proposed in Eq. (15) consists of using the OLS estimator corresponding to these transformed data: It is interesting to note that the orthogonal projection subspace V ⊥ r does not actually need to be obtained to derive β * r because only V ⊥ r V ⊥ r matters for this purpose, and this matrix happens to be directly known from V r through the orthogonality Eq. (16). This yields the right-most term of Eq. (17), which corresponds to the expression of β * r given above in Eq. (15).
However, the choice of applying the OLS estimator to the projected data as implied by Eq. (17), rather than applying the GLS estimator, may seem surprising at first. The rationale for this choice is the following. Since we only assume knowledge of the r leading eigenvectors, the projection subspace V ⊥ r is thus by construction unknown from the standpoint of the covariance of the projected noise. In other words, while the projected noise T ⊥ r ν in the projected regression equation remains Gaussian centred, we do not know anything about its covariance. Thus, we use by default the assumption that T ⊥ r ν is IID with unknown variance σ 2 , which is a commonplace assumption used in regression analysis when nothing is known about the noise's covariance. An immediate implication of this assumption is that the associated log likelihood is the following: The complete likelihood of Eq. (18) can be concentrated in β by maximizing out σ 2 to obtain the concentrated likelihood c (β): Our proposed OLS estimator β * r results from the maximization of the above likelihood. The above considerations also have some implications with respect to the uncertainty analysis on the proposed estimator β * r . Indeed, in the present context, it is straightforward to obtain confidence intervals around β * r based on the ratio of likelihoods c (β)/ c ( β * r ), following a classic approach in statistics that is often used in D&A (e.g. Hannart et al., 2014).
It is important to underline that the latter approach to uncertainty quantification relies on the assumption that the projected noise T ⊥ r ν is IID. Since the unknown true covariance of T ⊥ r ν is in fact equal to ⊥ r = V ⊥ r ⊥ r V ⊥ r , this assumption is incorrect and hence the resulting confidence intervals are prone to be incorrect as well. An initial assessment under test bed conditions described in Section 4 suggested that the proposed approach is conservative in the sense that it leads to an overestimation of uncertainty (not shown). However, a more detailed assessment of the performance of this uncertainty quantification procedure is needed, but is beyond the scope of the present paper.

Extension to total least squares
The signals x are estimated as the empirical averages of several ensembles of finite size m, which introduces sampling uncertainty. In order to account for the latter, it is commonplace in D&A to reformulate the linear regression model of Eq. (1) as an error-in-variables (EIV) model (Allen and Stott, 2003): where x * is now an unobserved latent variable, x is the ensemble average estimator of the true responses x * and ε is a Gaussian centred noise term with covariance /m. It is straightforward to generalize the approach exposed in Sect. 2.3 to this error-in-variables regression model. Likewise, the data (y, x), the latent variable x * , and the noise terms ν and ε are projected on the orthogonal subspace using V ⊥ r . The projected residuals are then also assumed IID with A. Hannart: Improved projection for detection and attribution unknown variance σ 2 . This yields the following complete likelihood: −2 log (β, x * , σ 2 ) = n(p + 1) log σ 2 + which can be concentrated in β by maximizing out σ 2 and x * to obtain the concentrated likelihood c (β): The so-called total least squares (TLS) estimator results from the maximization of the above concentrated likelihood in β, and is available under a closed form expression: where z = (z 1 , . . ., z p ) is a p-dimensional vector and z p+1 is a scalar, such that (z 1 , . . ., z p+1 ) is the eigenvector associated with the smallest eigenvalue of the positive definite matrix x y] of size p + 1. The estimator of Eq. (23) is a generalization of the OLS estimator of Eq. (17), the latter is indeed a special case of the former for m = +∞. Furthermore, in the particular case p = 1 which is often referred to as the Deming regression, the estimator of Eq. (23) takes the following expression: where S xx = x (I−V r V r )x, S yy = y (I−V r V r )y and S xy = x (I − V r V r )y. Next, following the same approach as in Sect. 2.3, it would be tempting to derive confidence intervals around β * r based on the ratio of likelihoods c (β)/ c ( β * r ). While this approach performs reasonably well in the context of EIV regression with known variance (Hannart et al., 2014), it is known to be problematic in the present context of EIV regression with unknown variance (Casella and Berger, 2008;Sect. 12.2). In this case, more suitable alternatives are available, e.g. an estimator of the variance of β * r for p = 1 is given by Fuller (1987) Sect. 1.3; it has the following expression: from which confidence intervals can be obtained. When p > 1 the above variance estimator is applied successively to each component i of β * r (i = 1, . . ., p) by replacing y with y − k =i β * r,k x k in Eq. (25). It is worthwhile noting that various approaches are available in the literature to tackle the general problem of deriving confidence intervals for β under the present EIV regression model with unknown variance. While it would be worthwhile to investigate and to compare their performance in practice in a D&A context, such investigation is beyond the scope of this paper.
Summarizing, the proposed solution here to deal with the case of error in variables thus consists of using Eq. (23) to derive an estimate of β and Eq. (25) to derive a confidence interval.

Continuity considerations
The motivation of this subsection is merely to provide a more in-depth mathematical grounding to the proposed estimator, beyond the general considerations of Sect. 2.1. This subsection can safely be skipped without affecting the understanding of the remainder of this article, in so far as it does not provide any additional results or properties regarding the estimator itself.
The idea here is that, in addition to the general considerations of Sect. 2.1, the proposed estimator β * r can also be justified by following a continuity extension argument. For any given x and y, let us consider the BLUE estimator of Eq. (2) as a function of the known covariance . For this purpose, we define the function f on the set of positive definite matrices of dimension n, denoted S + * n (R), as follows: so that we have β = f ( ).
In the present context, we assumed that is unknown but that r = V r r V r is known. While it is tempting to apply f in r , unfortunately r is not positive definite so that f is not defined in r . Hence, an estimate of β cannot be obtained by straightforwardly applying the function f in r . Nevertheless, r belongs to the set of positive semi-definite matrices, denoted S + n (R), which has the following interesting topological property: where for any subset E, the notation E used above in Eq. (27) denotes the adherence of E. Equation (27) thus implies that S + * n (R) is dense in S + n (R), which means that for any positive semi-definite matrix A there exists a sequence of positive definite matrices converging to A. Equation (27) therefore suggests the possibility to define a function f which continuously extends f to the set of positive semi-definite matrices. For this purpose, let us consider the matrix A+εI with ε > 0. It is immediate to show that for any A ∈ S + n (R) and for any arbitrarily small but positive ε, we have A + εI ∈ S + * n (R). Taking advantage of this remark, we define f by f (A) = lim ε→0 + f (A + εI). With this definition, f can be obtained under a closed form expression: The proof of Eq. (28) is detailed in Appendix A. The proposed estimator β * r of Eq. (15) is therefore obtained by applying the extended function f defined above, to the known, positive semi-definite matrix r :

Data
We illustrate the method by applying it to surface temperature data and climate model simulations over the 1901-2000 period and over the entire surface of the Earth. For this purpose, we use the data described and studied in . The temperature observations are based on the HadCRUT3 merged land-sea temperature data set (Brohan et al., 2006). The simulated temperature data used to estimate the response patterns are provided by results from the CMIP5 archive arising from simulations performed with the CNRM-CM5 model of Météo France under NAT and ANT experiments. For both experiments, the size of the ensembles is m = 6. However, estimates of internal variability are based on intra-ensemble variability from 10 CMIP5 models (CNRM-CM5, CanESM2, HadGEM2-ES, GISS-E2-R, GISS-E2-H, CSIRO-Mk3-6-0, IPSL-CM5A-LR, BCC-CSM1-1, NorESM1-M, FGOALS-g2) under NAT and ANT experiments as well as pre-industrial simulations from both the CMIP3 and CMIP5 archives, leading to a large ensemble of 374 members. Standard pre-processing is applied to all observed and simulated data. Data are first interpolated on the 5 • × 5 • observational grid and then the spatio-temporal observational mask is applied. The dimension of the data set is then further reduced by computing decadal means and projecting the resulting spatial patterns onto spherical harmonics with resolution T4 to obtain vectors of size n = 275 (see  for more details about ensembles and preprocessing). The ANT and NAT individual responses (p = 2) are estimated by averaging the corresponding two ensembles to obtain the matrix of regressors x. The covariance matrix is estimated as the empirical covariance of the large ensemble of control runs.

Performance on simulated data
This section evaluates the performance of the estimators described above, by applying it to simulated values of y obtained from the linear regression equation y = xβ + ν assumed by the model of Eq. (1), where the noise ν is simulated from a multivariate Gaussian distribution with covariance and where β = (1, 1) is used. This setting thus assumes that x is known exactly, therefore it does not include the EIV situation prevailing in Eq. (20) and in Sect. 3.3 where x is measured with an error.
The use of simulated rather than real data aims at verifying that our proposed estimator performs correctly, and at comparing its performance with the conventional procedure, a goal which requires the actual values of β to be known. A sample of N = 30 control runs was also simulated from a multivariate Gaussian distribution with covariance , and the r leading eigenvectors and eigenvalues were estimated from these control runs. Figure 2 shows scatter plots of one realization of the data (x, y) under this test bed, for the raw data (x, y) as well as for the transformed data (Tx, Ty), when using the three main transformations considered above: the optimal transformation T = V − 1 2 V which assumes a fully known covariance ; the truncated transformation matrix T r = V r − 1 2 r V r on the r leading eigenvectors, with r = 20, and the proposed projection matrix T ⊥ r = V ⊥ r V ⊥ r on the orthogonal to the r leading eigenvectors, again with r = 20. Figure 2 allows the behavior of these transformations to be visualized. Firstly, the benefit of the optimal transformation T = V − 1 2 V appears clearly, as it greatly enhances the signal. Secondly, it is also apparent that the standard truncated transformation using the r leading eigenvectors fails to enhance the signal, actually leading to a cloud of points which is more noisy than the original raw data. In contrast, the proposed transformation on the orthogonal to the r leading eigenvectors accu-rately captures part of the signal enhancement produced by the optimal transformation.
The conventional estimator β r and the proposed estimator β * r of Eq. (15) were both derived based on the r leading estimators estimated from the N control runs. Then, these two estimators were derived again, this time using the true values of the r leading eigenvectors, for comparison. Finally, the GLS estimator of Equation (1) was derived based on the true value of the covariance . The performance of each five estimators of β thus obtained was evaluated based on average mean squared error (MSE) J j =1 ( β j −β) ( β j −β)/N , where j = 1, . . ., J denotes the simulation number and J = 1e4. These calculations were repeated for several values of r ranging from 5 to 30. Results are plotted in Fig. 3. They show an important gap in MSE between the proposed estimator and the conventional approach with the former exceeding the latter by a factor of 1000 for r = 5, decreasing to 80 for r = 30. This gap suggests a substantial benefit of using β * r and strongly emphasizes the relevance of this approach compared to the conventional one.
When comparing the performance of β * r using estimated vs. actual values of the r leading eigenvectors, a slight benefit of using actual values is found. This benefit increases with r, as eigenvectors with lower eigenvalues tend to be more difficult to estimate. In contrast, when comparing the performance of β r using estimated vs. actual values of the r leading eigenvectors, a slight benefit of using the estimated values is found. This superiority of the estimated, and thus incorrect, values may seem surprising at first, but is in fact natural and in line with our main point. Indeed, it can be explained by the fact that the estimated leading eigenvectors V r match to a large extent with V r but also span to some extent the true orthogonal subspace V ⊥ r , which, according to our main point here, is much more suited for estimating β. thus performs better than its counterpart based on actual values.

Illustration on real data
The method is finally illustrated by applying it to actual observations of surface temperature y, with the same regressors x and covariance as described above. The observed temperature data y is based on the HadCRUT3 merged land-sea temperature data set (Brohan et al., 2006). The method is applied for r = 20. Further, we use a number of control runs N that is small compared to n, which is a common situation, even though we have access under the present test bed to 374 control runs. Thus, N = 30 control runs are randomly selected from the 374 control runs available and the eigenvectors are estimated from these 30 runs. For the inference of β and confidence intervals, we use the solution described in Sect. 3.2 which consists of using Eq. (23) to derive an estimate of β and Eq. (25) to derive a confidence interval. Results are shown in Fig. 4. For the scaling factor corresponding to anthropogenic forcings, both methods yield an estimate which is close to 1 and significantly positive. However, the level of uncertainty obtained with the proposed estimator is 3 times smaller than when using the conventional method. An even wider uncertainty gap is observed for the coefficient corresponding to natural forcings. In this case, the confidence interval includes zero under the conventional approach, but it does not under the proposed orthogonal approach. However, as already stated in Sect. 3, a caveat in the latter result is that the uncertainty quantification procedure used here has not been evaluated in detail, and thus we do not know for sure that the obtained uncertainty estimate is well calibrated.

Conclusions
We have introduced a new estimator of the vector β of linear regression coefficients, adapted to the context where only the r leading eigenvectors of the noise's covariance are known, which is an assumption often relevant in climate change attribution studies. General considerations have first shown that when the covariance is known, its most relevant eigenvectors for projection are those trading off the smallest possible eigenvalue with the best possible alignment to the signal. The optimal direction is thus associated in general with the smallest eigenvalues, not with the highest ones. Our proposed estimator builds upon this finding and is based on projecting the data on a subspace which is orthogonal to the r leading eigenvectors. When applied on a simulation test bed that is realistic with respect to D&A applications, we find that the proposed estimator outperforms the conventional estimator by several orders of magnitude for small values of r. Our proposal is thus prone to significantly affect D&A results by decreasing the estimated level of uncertainty.
Substantial further work is needed to evaluate the performance of the proposed uncertainty quantification, in particular in an EIV context. Such an evaluation was beyond the scope of the present work. Its primary focus was to demonstrate that the choice of the leading eigenvectors as a projection subspace is a vastly suboptimal one for D&A purposes; and that projecting on a subspace which is orthogonal to the r leading eigenvectors yields a significant improvement. Data availability. The data used for illustration throughout this article was first described and studied by Ribes and Terray (2013). As mentioned in the latter study, it can be accessed at the following url: http://www.cnrm-game.fr/spip.php?article23&lang=en (last access: 19 December 2018).

A. Hannart: Improved projection for detection and attribution
Appendix A: Proof of equation (28) Let A ∈ S + (R) be a non-invertible positive matrix. Denote A the matrix of non-zero eigenvalues of A, V A the matrix of eigenvectors associated with A and V ⊥ A the matrix of eigenvectors associated with the eigenvalue zero. For ε > 0, we have the following: Therefore, with B ε = ε( A + εI) −1 . We have lim ε→0 + B ε = 0, hence, which proves Eq. (28).