## Appendix A. Factor Model Parameter Priors and Updates

Parameters are updated using a Markov Chain Monte Carlo (MCMC) scheme. Denote the full conditional distribution of a parameter $\mathbf{\theta}$ as $\left[\mathbf{\theta}\right]$.

Observation Means (${\mathbf{\mu}}_{p}$): For each pollutant $\mathit{p}$, assume independent prior distributions, ${\mathbf{\mu}}_{\mathit{p}}\sim \mathit{N}({\mathit{m}}_{\mathbf{0}},{\mathit{s}}_{\mathbf{0}}^{\mathbf{2}})$.

Let ${\mathit{Y}}^{\mathit{p}}={({\mathit{Y}}_{\mathbf{1}}^{\mathit{p}},{\mathit{Y}}_{\mathbf{2}}^{\mathit{p}},\cdots ,{\mathit{Y}}_{\mathit{T}}^{\mathit{p}})}^{\prime}$ be a $\mathit{NT}\times \mathbf{1}$ vector of pollutant $\mathit{p}$ for all locations and time points, ${\mathsf{\Lambda}}^{\mathbf{p}}=({\mathsf{\Lambda}}_{\mathbf{1}}^{\mathit{p}},\cdots ,{\mathsf{\Lambda}}_{\mathit{m}}^{\mathit{p}})$ be a $\mathit{N}\times \mathit{mN}$ matrix of factor loadings, ${\mathsf{\Lambda}}^{\mathit{p}}$ be a $\mathit{NT}\times \mathit{mNT}$ block diagonal matrix with each of the $\mathit{T}$ diagonal blocks equal to ${\mathsf{\Lambda}}^{\mathit{p}}$, and $\mathbf{\delta}={({\mathbf{\delta}}_{1},\cdots ,{\mathbf{\delta}}_{\mathit{T}})}^{\prime}$ be a $\mathit{mNT}\times \mathbf{1}$ vector of all factors across all time. Then, we can rewrite the conditional distribution as ${\mathit{Y}}^{\mathit{p}}|{\mathit{\mu}}_{\mathit{p}},\mathit{\delta},{\mathsf{\Lambda}}^{\mathit{p}},{\mathbf{\sigma}}_{\mathit{p}}^{\mathbf{2}}\sim \mathit{N}({\mathbf{\mu}}^{\mathit{p}}{\mathbf{1}}_{\mathit{NT}}+{\mathsf{\Lambda}}^{\mathit{p}}\mathbf{\delta},{\mathbf{\sigma}}_{\mathit{p}}^{\mathbf{2}}{\mathit{I}}_{\mathit{NT}})$.

The resulting full conditional will then be, $\left[{\mathbf{\mu}}_{\mathit{p}}\right]\sim \mathit{N}(({\mathit{m}}_{\mathbf{0}}/{\mathit{s}}_{\mathbf{0}}^{\mathbf{2}}+{\mathbf{1}}_{\mathit{NT}}^{\prime}({\mathit{Y}}^{\mathit{p}}-{\mathsf{\Lambda}}^{\mathit{p}}\mathbf{\delta})/{\mathbf{\sigma}}_{\mathit{p}}^{\mathbf{2}}){(\mathbf{1}/{\mathit{s}}_{\mathbf{0}}^{\mathbf{2}}+\mathit{NT}/{\mathbf{\sigma}}_{\mathit{p}}^{\mathbf{2}})}^{-\mathbf{1}},{(\mathbf{1}/{\mathit{s}}_{\mathbf{0}}^{\mathbf{2}}+\mathit{NT}/{\mathbf{\sigma}}_{\mathit{p}}^{\mathbf{2}})}^{-\mathbf{1}})$.

Observation Variances (${\mathbf{\sigma}}_{\mathit{p}}^{\mathbf{2}}$): For each pollutant $\mathit{p}$, assume independent prior distributions, ${\mathbf{\sigma}}_{\mathit{p}}^{\mathbf{2}}\sim \mathit{IG}({\mathit{n}}_{\mathbf{\sigma}}/\mathbf{2},{\mathit{n}}_{\mathbf{\sigma}}{\mathit{s}}_{\mathbf{\sigma}}/\mathbf{2})$.

Referring to the notation defined above, the resulting full conditional will be $\left[{\mathbf{\sigma}}_{\mathit{p}}^{\mathbf{2}}\right]\sim \mathit{IG}((\mathit{NT}+{\mathit{n}}_{\mathbf{\sigma}})/\mathbf{2},({({\mathit{Y}}^{\mathit{p}}-{\mathbf{\mu}}^{\mathit{p}}{\mathbf{1}}_{\mathit{NT}}-{\mathsf{\Lambda}}^{\mathit{p}}\mathbf{\delta})}^{\prime}({\mathit{Y}}^{\mathit{p}}-{\mathbf{\mu}}^{\mathit{p}}{\mathbf{1}}_{\mathit{NT}}-{\mathsf{\Lambda}}^{\mathit{p}}\mathbf{\delta})+{\mathit{n}}_{\mathbf{\sigma}}{\mathit{s}}_{\mathbf{\sigma}})/\mathbf{2}).$

Factor Variances (${\mathbf{\tau}}_{\mathit{f}}^{\mathbf{2}}$): For each factor $\mathit{f}$, assume independent prior distributions, ${\mathbf{\tau}}_{\mathit{f}}^{\mathbf{2}}\sim \mathit{IG}({\mathit{n}}_{\mathbf{\tau}}/\mathbf{2},{\mathit{n}}_{\mathbf{\tau}}{\mathit{s}}_{\mathbf{\tau}}/\mathbf{2})$.

Let ${\mathsf{\Gamma}}_{\mathit{f}}$ be a $\mathit{NT}\times \mathit{NT}$ block diagonal matrix with each of the $\mathit{T}$ diagonal blocks equal to ${\mathsf{\Gamma}}_{\mathit{f}}$, ${\mathbf{\delta}}_{\mathit{f}}={({\mathbf{\delta}}_{\mathit{f},\mathbf{1}},\cdots ,{\mathbf{\delta}}_{\mathit{f},\mathit{T}})}^{\prime}$ be a $\mathit{NT}\times \mathbf{1}$ vector of factor $\mathit{f}$ for $\mathit{t}=\mathbf{1},\cdots ,\mathit{T}$, and ${\mathbf{\delta}}_{\mathit{f}}^{*}={({\mathbf{\delta}}_{\mathit{f},\mathbf{0}},\cdots ,{\mathbf{\delta}}_{\mathit{f},\mathit{T}-\mathbf{1}})}^{\prime}$ be a $\mathit{NT}\times \mathbf{1}$ vector of factor $\mathit{f}$ for $\mathit{t}=\mathbf{0},\cdots ,\mathit{T}-\mathbf{1}$.

The resulting full conditional will be, $\left[{\mathbf{\tau}}_{\mathit{f}}^{\mathbf{2}}\right]\sim \mathit{IG}((\mathit{NT}+{\mathit{n}}_{\mathbf{\tau}})/\mathbf{2},({({\mathbf{\delta}}_{\mathit{f}}-{\mathsf{\Gamma}}_{\mathit{f}}{\mathbf{\delta}}_{\mathit{f}}^{*})}^{\prime}({\mathbf{\delta}}_{\mathit{f}}-{\mathsf{\Gamma}}_{\mathit{f}}{\mathbf{\delta}}_{\mathit{f}}^{*})+{\mathit{n}}_{\mathbf{\tau}}{\mathit{s}}_{\mathbf{\tau}})/\mathbf{2})$.

Spatially Varying Factor Loadings (${\mathbf{\lambda}}_{\mathit{f}}^{\mathit{p},\mathit{s}}$): For each factor-pollutant pair, assume a Gaussian process prior for the factor loadings, diag(${\mathsf{\Lambda}}_{\mathit{f}}^{\mathit{p}})\sim \mathit{N}(\mathbf{0},{\mathit{s}}_{{\mathsf{\Lambda}}_{\mathit{f},\mathit{p}}}^{\mathbf{2}}\mathit{R}\left({\mathbf{\varphi}}_{{\mathsf{\Lambda}}_{\mathit{f},\mathit{p}}}\right))$. Let ${\mathsf{\Sigma}}_{\mathit{p}}$ be the block diagonal matrix of the $\mathit{m}$ covariance matrices associated with the $\mathit{m}$ factor loadings for pollutant $\mathit{p}$.

Let ${\mathbf{\delta}}_{\mathit{diag},\mathit{t}}=(\mathrm{diag}\left({\mathbf{\delta}}_{\mathbf{1},\mathit{t}}\right),\cdots ,\mathrm{diag}\left({\mathbf{\delta}}_{\mathit{m},\mathit{t}}\right))$ a $\mathit{N}\times \mathit{mN}$ matrix of the spatial factors at time $\mathit{t}$, ${\mathbf{\delta}}_{\mathit{diag}}$ a $\mathit{NT}\times \mathit{mN}$ matrix of the ${\mathbf{\delta}}_{\mathit{diag},\mathit{t}}$ stacked on top of each other for $\mathit{t}=\mathbf{1},\cdots ,\mathit{T}$, and ${\mathsf{\Lambda}}_{\mathit{vec}}^{\mathit{p}}={({\mathbf{\lambda}}_{\mathbf{1}}^{\mathit{p}}\left({\mathit{s}}_{\mathbf{1}}\right),\cdots ,{\mathbf{\lambda}}_{\mathbf{1}}^{\mathit{p}}\left({\mathit{s}}_{\mathit{N}}\right),\cdots ,{\mathbf{\lambda}}_{\mathit{m}}^{\mathit{p}}\left({\mathit{s}}_{\mathbf{1}}\right),\cdots ,{\mathbf{\lambda}}_{\mathit{m}}^{\mathit{p}}\left({\mathit{s}}_{\mathit{N}}\right))}^{\prime}$ a $\mathit{mN}\times \mathbf{1}$ vector of the factor loadings at all locations for pollutant $\mathit{p}$. Then, we can rewrite the conditional distribution as ${\mathit{Y}}^{\mathit{p}}|{\mathbf{\mu}}_{\mathit{p}},\mathbf{\delta},{\mathsf{\Lambda}}^{\mathit{p}},{\mathbf{\sigma}}_{\mathit{p}}^{\mathbf{2}}\sim \mathit{N}({\mathbf{\mu}}_{\mathit{p}}{\mathbf{1}}_{\mathit{NT}}+{\mathbf{\delta}}_{\mathit{diag}}{\mathsf{\Lambda}}_{\mathit{vec}}^{\mathit{p}},{\mathbf{\sigma}}_{\mathit{p}}^{\mathbf{2}}{\mathit{I}}_{\mathit{NT}}).$

Using standard multivariate Normal theory, the resulting full conditional will be, $\left[{\mathsf{\Lambda}}_{\mathit{vec}}^{\mathit{p}}\right]\sim \mathit{N}({\mathbf{\mu}}_{{\mathsf{\Lambda}}^{\mathit{p}}},{\mathsf{\Sigma}}_{{\mathsf{\Lambda}}^{\mathit{p}}})$ where ${\mathsf{\Sigma}}_{{\mathsf{\Lambda}}^{\mathit{p}}}={({\mathsf{\Sigma}}_{\mathit{p}}^{-\mathbf{1}}+\mathbf{1}/{\mathbf{\sigma}}_{\mathit{p}}^{\mathbf{2}}{\mathbf{\delta}}_{\mathit{diag}}^{\prime}{\mathbf{\delta}}_{\mathit{diag}})}^{-\mathbf{1}}$ and ${\mathbf{\mu}}_{{\mathsf{\Lambda}}^{\mathit{p}}}=\mathbf{1}/{\mathbf{\sigma}}_{\mathit{p}}^{\mathbf{2}}{\mathsf{\Sigma}}_{{\mathsf{\Lambda}}^{\mathit{p}}}\left({\mathbf{\delta}}_{\mathit{diag}}^{\prime}({\mathit{Y}}^{\mathit{p}}-{\mathbf{\mu}}_{\mathit{p}}{\mathbf{1}}_{\mathit{NT}})\right)$.

Spatially Constant Factor Loadings (${\mathbf{\lambda}}_{\mathit{f}}^{\mathit{p}}$): For each pollutant $\mathit{p}$ assume independent prior distributions, ${\mathbf{\lambda}}^{\mathit{p}}={({\mathbf{\lambda}}_{\mathbf{1}}^{\mathit{p}},\cdots ,{\mathbf{\lambda}}_{\mathit{m}}^{\mathit{p}})}^{\prime}\sim \mathit{N}(\mathbf{0},{\mathbf{\sigma}}_{{\mathbf{\lambda}}^{\mathit{p}}}^{\mathbf{2}}{\mathit{I}}_{\mathit{m}}).$

Let ${\mathbf{\delta}}_{\mathit{mat}}=({\mathbf{\delta}}_{\mathbf{1}},\cdots ,{\mathbf{\delta}}_{\mathit{m}})$ be the $\mathit{NT}\times \mathit{m}$ matrix of factors across time.

Then, the resulting full conditional will be, $\left[{\mathbf{\lambda}}^{\mathit{p}}\right]\sim \mathit{N}({\mathbf{\mu}}_{{\mathbf{\lambda}}^{\mathit{p}}},{\mathsf{\Sigma}}_{{\mathbf{\lambda}}^{\mathit{p}}})$ where ${\mathsf{\Sigma}}_{{\mathbf{\lambda}}^{\mathit{p}}}={(\mathbf{1}/{\mathbf{\sigma}}_{{\mathbf{\lambda}}^{\mathit{p}}}^{\mathbf{2}}{\mathit{I}}_{\mathit{m}}+\mathbf{1}/{\mathbf{\sigma}}_{\mathit{p}}^{\mathbf{2}}{\mathbf{\delta}}_{\mathit{mat}}^{\prime}{\mathbf{\delta}}_{\mathit{mat}})}^{-\mathbf{1}}$ and ${\mathbf{\mu}}_{{\mathbf{\lambda}}^{\mathit{p}}}={\mathsf{\Sigma}}_{{\mathbf{\lambda}}^{\mathit{p}}}(\mathbf{1}/{\mathbf{\sigma}}_{\mathit{p}}^{\mathbf{2}}{\mathbf{\delta}}_{\mathit{mat}}^{\prime}({\mathit{Y}}^{\mathit{p}}-{\mathbf{\mu}}_{\mathit{p}}{\mathbf{1}}_{\mathit{NT}}))$.

Factor Evolution Coefficients (${\mathbf{\gamma}}_{\mathit{f}}^{\mathit{s}}$): To ensure stationarity in time, the factor evolution coefficients are restricted to the interval $(-\mathbf{1},\mathbf{1})$, so assume a Truncated Gaussian process prior, ${\mathbf{\gamma}}_{\mathit{f}}\sim {\mathit{TN}}_{(-\mathbf{1},\mathbf{1})}({\mathbf{0}}_{\mathit{N}},{\mathit{s}}_{{\mathsf{\Gamma}}_{\mathit{f}}}^{\mathbf{2}}\mathit{R}\left({\mathbf{\varphi}}_{{\mathsf{\Gamma}}_{\mathit{f}}}\right)={\mathsf{\Sigma}}_{{\mathsf{\Gamma}}_{\mathit{f}}})$.

Let ${\mathbf{\gamma}}_{\mathit{f}}=\mathrm{diag}\left({\mathsf{\Gamma}}_{\mathit{f}}\right)$ be a $\mathit{N}\times \mathbf{1}$ vector of evolution coefficients, ${\mathbf{\delta}}_{\mathit{f},\mathit{diag}}^{*}$ be the $\mathit{NT}\times \mathit{N}$ matrix stacking $\mathit{T}$ diagonal $\mathit{N}\times \mathit{N}$ matrices of ${\mathbf{\delta}}_{\mathit{f},\mathit{t}}$ for $\mathit{t}=\mathbf{0},\cdots ,\mathit{T}-\mathbf{1}$.

Then, the resulting full conditional will be $\left[{\mathbf{\gamma}}_{\mathit{f}}\right]\sim {\mathit{TN}}_{(-\mathbf{1},\mathbf{1})}({\mathbf{\mu}}_{{\mathbf{\gamma}}_{\mathit{f}}},{\mathsf{\Sigma}}_{{\mathbf{\gamma}}_{\mathit{f}}})$, where ${\mathsf{\Sigma}}_{{\mathbf{\gamma}}_{\mathit{f}}}={({\mathsf{\Sigma}}_{{\mathsf{\Gamma}}_{\mathit{f}}}^{-\mathbf{1}}+\mathbf{1}/{\mathbf{\tau}}_{\mathit{f}}^{\mathbf{2}}{\mathbf{\delta}}_{\mathit{f},\mathit{diag}}^{{*}^{\prime}}{\mathbf{\delta}}_{\mathit{f},\mathit{diag}}^{*})}^{-\mathbf{1}}$ and ${\mathbf{\mu}}_{{\mathbf{\gamma}}_{\mathit{f}}}={\mathsf{\Sigma}}_{{\mathbf{\gamma}}_{\mathit{f}}}(\mathbf{1}/{\mathbf{\tau}}_{\mathit{f}}^{\mathbf{2}}{\mathbf{\delta}}_{\mathit{f},\mathit{diag}}^{{*}^{\prime}}{\mathbf{\delta}}_{\mathit{f}}).$

Factors (${\mathbf{\delta}}_{\mathit{t}}$): Assume ${\mathbf{\delta}}_{\mathit{f},\mathbf{0}}\sim \mathit{N}({\mathbf{0}}_{\mathit{N}},{\mathit{s}}_{{\mathbf{\delta}}_{\mathit{f}}}^{\mathbf{2}}\mathit{R}\left({\mathbf{\varphi}}_{{\mathbf{\delta}}_{\mathit{f}}}\right))$ is a realization from a Gaussian process for each factor $\mathit{f}$. Let ${\mathit{m}}_{\mathbf{0}}={\mathbf{0}}_{\mathit{mN}}$ and ${\mathit{C}}_{\mathbf{0}}$ be an $\mathit{mN}\times \mathit{mN}$ block diagonal matrix corresponding to the covariance matrix for the collection of all factors at time 0, ${\mathbf{\delta}}_{\mathbf{0}}={({\mathbf{\delta}}_{\mathbf{1},\mathbf{0}},\cdots ,{\mathbf{\delta}}_{\mathit{m},\mathbf{0}})}^{\prime}$. Let ${\mathsf{\Sigma}}_{\mathbf{\u03f5}}$ and ${\mathsf{\Sigma}}_{\mathit{w}}$ be the diagonal covariance matrices associated with ${\mathbf{\u03f5}}_{\mathit{t}}$ and ${\mathit{w}}_{\mathit{t}}$, respectively.

The factors are updated through a Forward Filtering Backwards Sampling (FFBS) algorithm [

21,

22]:

**Forward Filtering:** For $\mathit{t}=\mathbf{1},\cdots ,\mathit{T}$, compute ${\mathit{m}}_{\mathit{t}}={\mathit{a}}_{\mathit{t}}+{\mathit{A}}_{\mathit{t}}({\mathit{Y}}_{\mathit{t}}-{\tilde{\mathit{Y}}}_{\mathit{t}})$ and ${\mathit{C}}_{\mathit{t}}={\mathit{R}}_{\mathit{t}}-{\mathit{A}}_{\mathit{t}}{\mathit{Q}}_{\mathit{t}}{\mathit{A}}_{\mathit{t}}^{\prime}$, where ${\mathit{a}}_{\mathit{t}}=\mathsf{\Gamma}{\mathit{m}}_{\mathit{t}-\mathbf{1}}$, ${\mathit{A}}_{\mathit{t}}={\mathit{R}}_{\mathit{t}}{\mathsf{\Lambda}}^{\prime}{\mathit{Q}}_{\mathit{t}}^{-\mathbf{1}}$, ${\mathit{Q}}_{\mathit{t}}=\mathsf{\Lambda}{\mathit{R}}_{\mathit{t}}{\mathsf{\Lambda}}^{\prime}+{\mathsf{\Sigma}}_{\mathbf{\u03f5}}$, ${\mathit{R}}_{\mathit{t}}=\mathsf{\Gamma}{\mathit{C}}_{\mathit{t}-\mathbf{1}}{\mathsf{\Gamma}}^{\prime}+{\mathsf{\Sigma}}_{\mathit{w}}$, and ${\tilde{\mathit{Y}}}_{\mathit{t}}=\mathbf{\mu}+\mathsf{\Lambda}{\mathit{a}}_{\mathit{t}}$. Then, sample ${\mathbf{\delta}}_{\mathit{T}}\sim \mathit{N}({\mathit{m}}_{\mathit{T}},{\mathit{C}}_{\mathit{T}})$.

**Backwards Sampling:** For $\mathit{t}=(\mathit{T}-\mathbf{1}),\cdots ,\mathbf{0}$ sample ${\mathbf{\delta}}_{\mathit{t}}\sim \mathit{N}({\tilde{\mathit{a}}}_{\mathit{t}},{\tilde{\mathit{C}}}_{\mathit{t}}),$ where ${\tilde{\mathit{a}}}_{\mathit{t}}={\mathit{m}}_{\mathit{t}}+{\mathit{B}}_{\mathit{t}}({\mathbf{\delta}}_{\mathit{t}+\mathbf{1}}-{\mathit{a}}_{\mathit{t}+\mathbf{1}})$, ${\tilde{\mathit{C}}}_{\mathit{t}}={\mathit{C}}_{\mathit{t}}-{\mathit{B}}_{\mathit{t}}{\mathit{R}}_{\mathit{t}+\mathbf{1}}{\mathit{B}}_{\mathit{t}}^{\prime}$, and ${\mathit{B}}_{\mathit{t}}={\mathit{C}}_{\mathit{t}}{\mathsf{\Gamma}}^{\prime}{\mathit{R}}_{\mathit{t}+\mathbf{1}}^{-\mathbf{1}}$.

When data are unobserved at a particular time,

$\mathit{t}$, we assume it is missing at random. Following the reasoning of [

23], there is no additional information incorporated into the posterior and the factors are updated setting

${\mathit{m}}_{\mathit{t}}={\mathit{a}}_{\mathit{t}}$ in the FFBS algorithm.

Spatial Parameters (${\mathit{s}}_{\mathit{i}}^{\mathbf{2}},{\mathbf{\varphi}}_{\mathit{i}}$): There are $\mathit{mP}$ parameters ${\mathit{s}}_{{\mathsf{\Lambda}}_{\mathit{f},\mathit{p}}}^{\mathbf{2}}$, $\mathit{m}$ parameters ${\mathit{s}}_{{\mathsf{\Gamma}}_{\mathit{f}}}^{\mathbf{2}}$ and $\mathit{m}$ parameters ${\mathit{s}}_{{\mathbf{\delta}}_{\mathit{f}}}^{\mathbf{2}}$ describing the variance in the spatial processes. Denote these as ${\mathit{s}}_{\mathit{i}}^{\mathbf{2}}$ with priors $\mathit{IG}({\mathit{n}}_{\mathit{i}}/\mathbf{2},{\mathit{n}}_{\mathit{i}}{\mathit{s}}_{\mathit{i}}/\mathbf{2})$. Similar for the range parameters ${\mathbf{\varphi}}_{\mathit{i}}$ with priors $\mathit{IG}({\mathit{a}}_{\mathit{i}},{\mathit{b}}_{\mathit{i}})$.