There is considerable demand for accurate air quality information in human
health analyses. The sparsity of ground monitoring stations across the United
States motivates the need for advanced statistical models to predict air
quality metrics, such as PM

Particulate matter (PM) in the atmosphere poses a dangerous public health
risk worldwide with effects ranging from reduced vision to respiratory and
cardiovascular problems

Over the past decade, the demand for spatial models to provide estimates of
air quality for inputs in human health analyses and to assess advances in air
quality has grown rapidly. Due to the spatial sparsity and varying monitoring
schedules of fine particulate monitoring stations, several issues, which rely
on surface PM

Of interest here is evaluation of the effectiveness of employing columnar
measurements of AOT obtained using passive satellite remote sensing in
predicting PM

The VIIRS data used for this analysis were obtained from the NOAA
Comprehensive Large Array-data Stewardship System (CLASS)

see

AOT satellite measurements have been shown to be a proxy for surface
PM

One of the main advantages of estimating ground-level PM

More recently, researchers have used Light Detection And Ranging (lidar)
measurements, an active remote sensing technique, to provide additional
insights into the relationship between the vertical-resolved aerosol
extinction coefficient and PM

Extensive missing data in both data sources pose a second challenge and
missingness is not necessarily missing at random

Correlation between data sources has been shown to be better, leading to
improved out-of-sample prediction, when modeling daily data versus data
aggregated to monthly or yearly scales

Van Donkelaar et al. (2012) improved upon the prediction of daily ground-level
PM

Our contribution is to address some of the foregoing modeling challenges in
predicting daily PM

For model comparison, we consider competing submodels nested within
our model in terms of fit (in-sample) and prediction (out-of-sample) of
PM

CMAQ data can be obtained
at

The remainder of this paper is structured as follows. In Sect.

We obtain PM

The EPA's AQS PM

FRM PM

The 764 monitoring station locations of PM

The VIIRS AOT data product used for this study is
the validated stage 2 level maturity EDR AOT
available at the National Oceanic and Atmospheric (NOAA) Comprehensive Large Array-data Stewardship System (CLASS)

The VIIRS aerosol EDRs are
available at

Three swaths covering the continental United States on an illustrative summer day.

The VIIRS satellite orbits the earth roughly once per hour, producing swaths
of AOT data for each orbit. The orbit occurs at roughly 14:30 LT
during daylight savings time and 13:30 LT during non-daylight savings time.
On each day, the conterminous United States is observed by the compilation of
three or four orbits of the satellite, depending on the orbiting pattern. As seen in
Fig.

While polar-orbiting satellites produce one measurement of AOT per day across
the entire globe, a large number of areal grid cell observations are flagged
and removed through a quality assurance protocol

Figure

AOT on 3 July 2013.

PM

The model for daily PM

We propose a hierarchical model for PM

Let P

Here,

Van Donkelaar et al. (2012) proposed a priori adjustments to AOT that rely on
additional sources of computer model output to try to account for the
variability in the vertical profile of AOT. Here, the processes

If AOT is missing at grid cell

The spatial processes,

The hierarchical model is fitted within the Bayesian framework, enabling
convenient Gibbs sampler loops for imputation. That is, we update the missing
AOTs given the parameters and update the parameters given the missing and
observed AOTs. We assign prior distributions to all model parameters. When
possible, conjugate, non-informative priors are used. For

The diagonal elements of the

The model is applied to data from 510 monitoring
stations located across the conterminous United States that have consecutive daily
PM

The
meteorological data are reported by the EPA at the AQS monitoring stations and
can be obtained at

We compare our proposed Eq. (

Submodel (S1) is an autoregressive model with only one spatially varying
coefficient. Here, the day-specific intercept is global and a
spatially varying coefficient for AOT is employed to capture the
spatially varying relationship between PM

Submodel (S2) is non-autoregressive. That is, both intercept and AOT
coefficient are spatially varying and day-specific but there is no temporal
dynamics for PM

To assess the significance of AOT in our proposed model, we consider a
submodel (S3) without AOT. This model still has a spatial random effect to
capture the spatial heterogeneity in PM

We obtain inference for all model parameters in the Bayesian framework using
MCMC and a hybrid Metropolis-within-Gibbs
algorithm. Additional details regarding the sampling algorithm are included
in Appendix

We present the results comparing our proposed model (Eq.

Daily MSE for in-sample locations for 18 June–9 July 2013
under the competing models (Eqs.

Daily MAD for in-sample locations for 18 June–9 July 2013
under the competing models, (Eqs.

We predict PM

Daily MAPD for the out-of-sample locations under the competing models.

Daily average CRPS for the out-of-sample locations under the competing models.

Focusing on Eq. (

Spatial maps of the posterior mean estimates for

There do not appear to be any obvious temporal patterns between consecutive
days of either of the spatial processes; this is likely captured by the
autoregression. Also, the estimates of

Mean and 90 % credible interval by day for the (left) global
intercept,

Posterior mean estimates of the coefficients,

The proportion of AOT coefficient estimates by day (left) and across space
(right) that are significantly different than 0 based on the 90 %
credible interval are given in Fig.

The proportion of

The proportion of

The proportion of pseudo-estimates by day (left) and across space (right) that are significantly different than 0.

Equation (

Posterior estimates and 90 % credible intervals for the remaining model
parameters are given in Table

The posterior mean and 90 % credible interval for parameters.

Mean and standard deviation estimates of the posterior predictive
distribution of PM

Mean and standard deviation estimates of the posterior predictive
distribution of PM

Illustratively, Figs.

A hierarchical autoregressive model with daily spatially varying coefficients
was employed to jointly model consecutive day average PM

Our analyses show small in-sample improvement incorporating AOT into our
autoregressive model for daily PM

We considered further modeling efforts involving multiple sources of
PM

The significance of AOT in out-of-sample prediction may be suppressed by the
fact that AOT is a vertically and spatially integrated measure while
PM

Due to the quality assurance protocol of AOT from VIIRS, there is a lot of
missing AOT data, which may also contribute to the ineffectiveness of AOT in
our model in terms of out-of-sample prediction. Improvement in AOT retrieval
algorithms, particularly with regard to surface reflectivity, and improved
cloud screening may lead to lower levels of missingness in the future.
The GOES-R Advanced Baseline Imager (ABI) scheduled to launch in March of 2016
will provide an AOD data product at 2 km

Normal Q–Q plots of the log-transformation of PM

Normal Q–Q plots of the log-transformed PM

Normal Q–Q plots of the square root-transformed AOT on 4 consecutive days.

Full posterior inference is obtained using a hybrid Metropolis-within-Gibbs
sampler. Let

The full conditional distributions are given in details below.

where

For

where

where

where

For

where

where

where

where

Sampling

where

Sampling

The autoregressive parameter,

Recall that we must infill AOD on the fly for grid cells that contain monitoring stations but are missing AOD.
When AOD is observed at grid cell

where

If grid cell

If grid cell

The work of the first author was supported in part by funding through the
US EPA's Office of Research and Development using EPA contract
EP-13-D-000257. The authors thank James Szykman of the National Exposure
Research Laboratory, Environmental Science Division, Landscape
Characterization Branch (US EPA) for providing the VIIRS satellite aerosol
optical thickness (AOT) data product used in this analysis and valuable
discussion that greatly improved this manuscript.