# An Alternative to PCA for Estimating Dominant Patterns of Climate Variability and Extremes, with Application to U.S. and China Seasonal Rainfall

## Abstract

**:**

## 1. Introduction

## 2. Principal Component Analysis

#### 2.1. PCA Standard Derivation

#### 2.2. PCA Alternative Derivation

#### 2.2.1. Use of the Multivariate Normal Distribution

#### 2.2.2. Derivation

#### 2.3. Two-Dimensional Example

## 3. Directional Component Analysis

#### 3.1. Scaling

#### 3.2. Two Dimensional Example

#### 3.3. DCA Main Derivation

- The given level of total rainfall anomaly a does not appear in the definition of ${g}_{1}$: the first DCA spatial pattern is the same whatever the level of rainfall anomaly specified, because of the normalisation to unit length. The first DCA spatial pattern is a property of the covariance matrix (and hence of the original data) only.
- Equation (10) above gives a set of solutions based on different values for $\lambda $. The different solutions are simply different scalings of the same vector $Cr$.
- Each value of $\lambda $ corresponds to a solution for a different value of a.
- Given any value of $\lambda $ we can calculate g (using $g=\lambda Cr$), and given g we can calculate a (using $a={g}^{T}r$) and ${M}^{2}$ (using ${M}^{2}=-{g}^{T}{C}^{-1}g$), giving one complete solution to the constrained maximisation problem, for a given value of a.
- For instance, ${g}_{1}$, which corresponds to a value of $\lambda =\frac{1}{\left|Cr\right|}$, solves the constrained maximisation problem for a given value of total rainfall anomaly of $a={g}_{1}^{T}r$, and the Mahalanobis consistency achieved at the maximum in that case is $-{g}_{1}^{T}{C}^{-1}{g}_{1}$.
- Other values of $\lambda $ give solutions with different values for the total rainfall anomaly and the Mahalanobis consistency: the larger the value of $\lambda $, the larger the total rainfall anomaly, the lower the Mahalanobis consistency (and the lower the likelihood).
- All the different solutions can be derived from the unit length solution ${g}_{1}$ just by rescaling.
- The extent to which the DCA pattern has achieved its goal, of finding a pattern that has a greater likelihood for the same rainfall anomaly than PCA, can be measured by comparing the ratio $a/M$ between the first PCA and DCA spatial patterns. Because a and M are both linear multiples of $\lambda $, the $\lambda $’s cancel and this ratio does not depend on $\lambda $. DCA would be expected to have a higher value for this ratio, and indeed DCA can also be derived based on the idea of maximising this ratio. We will evaluate this ratio for all our examples below.

#### 3.4. Derivation of the Second DCA Pattern

#### 3.5. Regression-Based Derivation

#### 3.6. Properties of DCA

- Since PCA is designed to maximise explained variance, the explained variance of the first DCA pattern will be less than or equal to the explained variance of the first PCA pattern. The explained variance will only be equal to that of the first PCA pattern in the degenerate case that the first PCA spatial pattern equals the vector r, in which case the first DCA spatial pattern will also equal r.
- Since the first DCA spatial pattern is designed to maximise total rainfall anomaly (for a given value of likelihood) the total rainfall anomaly ${g}_{1}^{T}r$ for the first DCA spatial pattern will be greater than or equal to that of the first PCA spatial pattern. In Figure 2 this corresponds to the DCA vector reaching further into the region of large total rainfall anomaly in the top right hand corner of the diagram. Once again it will only be equal in the degenerate case. This property can be shown by comparing the definition of DCA with the second definition of PCA given above. It is also proven more carefully in the supporting information, Sections S4 and S5.
- If the first DCA spatial pattern is scaled to have the same total rainfall anomaly as the first PCA spatial pattern, it will have a higher or equal likelihood, equal only in the degenerate case. This property can be shown from the definitions of PCA and DCA, and is also proven in the supporting information, Section S6.
- If the first DCA spatial pattern is scaled to have the same likelihood as the first PCA spatial pattern (which is how the arrows are scaled in Figure 2) it will have a greater or equal value for the total rainfall anomaly, equal only in the degenerate case. This property follows from the definitions, but is also proven carefully in the supporting information, Section S7.
- In the non-degenerate case, the first DCA spatial pattern can be scaled to the in-between case where it has both more rainfall and a higher likelihood than the first PCA spatial pattern. This is the most interesting property of DCA in comparison with PCA, and is the property which suggests that DCA is the better method for identifying spatial extremes (defined here as extremes in the total anomaly summed across the pattern). It is proven in the supporting information, Section S8.

## 4. Application of DCA to Observed U.S. Winter Rainfall

#### 4.1. Discussion of Pattern Structure

#### 4.2. Statistics of the Unit Vector Patterns

#### 4.3. Scaled Patterns

#### 4.4. Equal Total Rainfall Anomaly Scaling

#### 4.5. Equal Likelihood Scaling

#### 4.6. Intermediate Scaling

#### 4.7. Equal Total Rainfall Scaling at Larger Amplitude

## 5. Further Examples

#### 5.1. U.S. Summer Rainfall

#### 5.2. China Winter Rainfall

#### 5.3. China Summer Rainfall

## 6. Discussion

- A target spatial domain and timescale needs to be identified (in our example: the continential U.S. for a seasonal timescale)
- A target return period would be identified (such as 200 years return period)
- Standard methodologies from extreme value theory could be used to estimate the total rainfall anomaly or total drought index over the domain at that return period.
- Given this total rainfall anomaly amount the first DCA spatial pattern could be scaled to give exactly that rainfall amount. It is an appropriate pattern to represent possible rainfall extremes at that return period, since it has a higher likelihood than any other pattern with that total rainfall anomaly (by definition of DCA)
- The DCA spatial pattern so derived could then be used to drive impact models

## Supplementary Materials

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Smith, T.; Reynolds, R. Bias corrections for historical sea surface temperature based on marine air temperature. J. Clim.
**2002**, 15, 73–87. [Google Scholar] [CrossRef][Green Version] - Kurnik, B.; Kajfez-Bogataj, L.; Horion, S. An assessment of actual evapotranspiration and soil water deficit in agricultural regions in Europe. Int. J. Climatol.
**2015**, 35, 2451–2471. [Google Scholar] [CrossRef] - Mestas-Nunez, A.M. Orthogonality properties of rotated empirical modes. Int. J. Climatol.
**2000**, 20, 1509–1516. [Google Scholar] [CrossRef] - Lian, T.; Chen, D. An Evaluation of Rotated EOF Analysis and Its Application to Tropical Pacific SST Variability. J. Clim.
**2012**, 25, 5361–5373. [Google Scholar] [CrossRef] - Chen, H.; Sun, J. Characterizing present and future drought changes over eastern China. Int. J. Climatol.
**2017**, 37, 138–156. [Google Scholar] [CrossRef] - Fraedrich, K.; McBride, J.; Frank, W.; Wang, R. Extended EOF Analysis of Tropical Disturbances: TOGA COARE. J. Atmos. Sci.
**1997**, 41, 2363–2372. [Google Scholar] [CrossRef] - Kim, J.; Oh, H.; Lim, Y.; Kang, H. Seasonal precipitation prediction via data-adaptive principal component regression. Int. J. Climatol.
**2017**, 37, 75–86. [Google Scholar] [CrossRef] - Wilks, D. Statistical Methods in the Atmospheric Sciences; Academic Press: Cambridge, MA, USA, 1995. [Google Scholar]
- Von Storch, H.; Zwiers, F.W. Statistical Analysis in Climate Research; Cambridge University Press: Cambridge, UK, 1999. [Google Scholar]
- Jolliffe, I. Principal Component Analysis; Springer: Berlin, Germany, 2002. [Google Scholar]
- Hannachi, A.; Jolliffe, I.; Stephenson, D. Empirical othogonal functions and related techniques in atmospheric science: A review. Int. J. Climatol.
**2007**, 27, 1119–1152. [Google Scholar] [CrossRef] - Carter, L.; Moss, S. Drought Stress Testing: Making Financial Institutions More Resilient to Environmental Risks. 2017. Available online: https://naturalcapital.finance/wp-content/uploads/2018/11/Drought-Stress-Testing-Tool-FULL-REPORT.pdf (accessed on 25 June 2019).
- Harris, I.; Jones, P.; Osborn, T.; Lister, D. Updated high-resolution grids of monthly climatic observations— The CRU TS3.10 Dataset. Int. J. Climatol.
**2013**, 34, 623–642. [Google Scholar] [CrossRef][Green Version]

**Figure 1.**The first PCA spatial pattern for U.S. winter rainfall anomalies. This pattern maximises explained variance.

**Figure 2.**PCA and DCA spatial patterns in a space with two dimensions. The axes are the two dimensions, which might be, for instance, rainfall anomaly amounts at two locations. The diagonal lines then show lines of constant total rainfall anomaly. Assuming that the two variables are bivariate normal distributed in space with a weak negative correlation the ellipse shows a contour of constant likelihood (probability density) or constant Mahalanobis consistency, with higher likelihoods (higher Mahalanobis consistency, lower Mahalanobis distance) inside the ellipse. Each point in this two dimensional space represents a spatial pattern, consisting of a single realisation from the bivariate normal, made up of rainfall anomaly values at the two locations. The tip of the double arrow gives the rainfall anomaly values for the first PCA spatial pattern, while the tip of the dotted arrow gives rainfall anomaly values for a scaled version of the first DCA spatial pattern. In this case the two patterns are scaled to have the same likelihood, as we can see from the fact that they both just touch the ellipse. The PCA spatial pattern has larger explained variance, while the DCA spatial pattern (which is the point on the ellipse with the greatest total rainfall anomaly, by definition) captures a greater total rainfall anomaly, and so is more relevant for understanding extreme rainfall totals.

**Figure 3.**The first DCA spatial pattern for U.S. winter rainfall anomalies. With appropriate scaling this pattern is both more likely and represents a greater total rainfall anomaly than any given scaling of the first PCA spatial pattern. As a result it is more appropriate for understanding possible extremes in total rainfall anomalies than the first PCA spatial pattern.

**Figure 4.**The total rainfall anomaly and Mahalanobis distances for the PCA and DCA patterns shown in Figure 1 and Figure 3, and for various scalings of these patterns. The large blue circle “P” corresponds to the first PCA pattern. The large red circle “D” corresponds to the first DCA pattern. The blue line corresponds to possible scalings of the first PCA pattern. The red line corresponds to possible scalings of the first DCA pattern. The interesting properties of DCA arise from the fact that the red line is above the blue line. The small red circle labelled “1” corresponds to a scaling of the first DCA pattern that reduces the rainfall anomaly to the same level as that of the first PCA pattern. The small red circle labelled “2” corresponds to a scaling of the first DCA pattern that reduces the Mahalanobis distance to the same as that of the first PCA pattern. The small red circle labelled “3” is an example of a scaling of the first DCA pattern in between the scalings used for 1 and 2. The pattern corresponding to point 3 has both a larger rainfall amount and a lower Mahalanobis distance (and hence higher likelihood) than the first PCA pattern.

**Figure 5.**The first PCA spatial pattern for U.S. summer rainfall anomalies. This pattern maximises explained variance.

**Figure 6.**The first DCA spatial pattern for U.S. summer rainfall anomalies. With appropriate scaling this pattern is both more likely and represents a greater total rainfall anomaly than any given scaling of the first PCA spatial pattern. As a result it is more appropriate for understanding possible extremes in total rainfall anomalies than the first PCA spatial pattern.

**Figure 7.**PCA and DCA first spatial patterns for winter and summer rainfall anomalies in China. Panels (

**a**,

**b**) show winter PCA and DCA respectively. Panels (

**c**,

**d**) show summer PCA and DCA respectively.

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Jewson, S. An Alternative to PCA for Estimating Dominant Patterns of Climate Variability and Extremes, with Application to U.S. and China Seasonal Rainfall. *Atmosphere* **2020**, *11*, 354.
https://doi.org/10.3390/atmos11040354

**AMA Style**

Jewson S. An Alternative to PCA for Estimating Dominant Patterns of Climate Variability and Extremes, with Application to U.S. and China Seasonal Rainfall. *Atmosphere*. 2020; 11(4):354.
https://doi.org/10.3390/atmos11040354

**Chicago/Turabian Style**

Jewson, Stephen. 2020. "An Alternative to PCA for Estimating Dominant Patterns of Climate Variability and Extremes, with Application to U.S. and China Seasonal Rainfall" *Atmosphere* 11, no. 4: 354.
https://doi.org/10.3390/atmos11040354