---
class: center, .toc[[✧](#toc)]
## How many EOFs do we need?
Example of reconstructing the total signal at one location
---
class: left, .toc[[✧](#toc)]
## How to practically calculate EOFs?
Do not use unknown “blackbox” code. Write your own, it takes a few lines (in Matlab). Really. The not-so-efficient-way is:
assuming your data matrix D is organized in columns:
```ruby
X = detrend(D,'constant');
C = ctranspose(X)*X/(N-1);
[U,G] = eig(C);
A = X*U;
```
The first line removes the mean of each column; the second line calculates the covariance matrix `$\mathbf{C}$`; the third line calculates the eigen vectors `$\mathbf{U}$` and eigen matrix `$\boldsymbol{\Gamma}$` of `$\mathbf{C}$`; the 4th line calculates the principal component matrix `$\mathbf{A}$`. WATCH OUT, EIG sort the eigen values in increasing order!
You're done, in 4 lines!
--
But that's not great because the covariance matrix can become very large, and computing ALL the eigen values and vectors can take a very long time and crash your computer.
---
class: left, .toc[[✧](#toc)]
## How to practically calculate EOFs?
A slightly better way is to compute just a few eigen values and eigen vectors:
```ruby
X = detrend(D,'constant');
C = ctranspose(X)*X/(N-1);
[U,G] = eigs(C);
A = X*U;
```
Note how the function EIGS instead of EIG is used, computing only the first 6 values. Here, you still compute the covariance matrix which can be very large. EIGS sort the eigen values in decreasing order!
---
class: left, .toc[[✧](#toc)]
## How to practically calculate EOFs?
The best way (but others may disagree) is as follows:
```ruby
X = detrend(D,'constant');
[P,L,U] = svd(X,'econ');
A = P*L;
G = ctranspose(L)*L/(N-1);
C = U*G*ctranspose(U);
```
The last step computes the covariance matrix, but do this only if you really need the covariance matrix! This method computes only a number of eigen values and eigen vectors equal to the minimum of your time steps or of your data points, which is likely more than enough! This methods sorts the eigen value in decreasing order!
---
name: svd
class: center, middle, .toc[[✧](#toc)]
# 4. Singular Value Decomposition
---
class: left, .toc[[✧](#toc)]
## SVD analysis for studying coupling
Let's say you have two field variables $\mathbf{X}$ and $\mathbf{Y}$ at $P$ and $M$ locations, respectively, simultaneously observed at $N$ times.
There exist multiple ways to study the coupling between two fields. One of them is called the *Singular Value Decomposition Analysis*, or SVD, sometimes called *Maximum Covariance Analysis*.
SVD analysis provides a decomposition of both fields in a way that maximizes the (cross-)covariance between the two fields, just like an EOF analysis provides a decomposition that maximizes the (auto-co)variance of a single field. In fact, EOF analysis is a particular case of SVD analysis when `$\mathbf{Y} = \mathbf{X}$`.
---
class: left, .toc[[✧](#toc)]
Let's call the $X$ data the "left field" (e.g. SST), which is `$N \times P$`
`$$\mathbf{X} = \left[\mathbf{X}_1,\, \mathbf{X}_2,\, \cdots ,\,\mathbf{X}_P\right]
=\left[\matrix{X_1(1) & X_2(1) & \dots & X_P(1)\cr X_1(2) & X_2(2) & \cdots & X_P(2)\cr \vdots & \vdots & \cdots & \vdots \cr X_1(N) & X_2(N) & \cdots & X_P(N)\cr}\right]$$`
and let's call the $Y$ data the "right field" (e.g. MSLP), which is `$N \times M$`
`$$\mathbf{Y} = \left[\mathbf{Y}_1,\, \mathbf{Y}_2,\, \cdots ,\,\mathbf{Y}_M\right]
=\left[\matrix{Y_1(1) & Y_2(1) & \dots & Y_M(1)\cr Y_1(2) & Y_2(2) & \cdots & Y_M(2)\cr \vdots & \vdots & \cdots & \vdots \cr Y_1(N) & Y_2(N) & \cdots & Y_M(N)\cr}\right]$$`
Their cross-covariance matrix is `$P \times M$` (not symmetric):
`$$\widehat{\mathbf{C}}_{XY} = \frac{1}{N-1}\mathbf{X}^\mathrm{T}\mathbf{Y}, \quad c_{kj} = \frac{1}{N-1}\sum_{n=1}^{n=N} X_k(n)Y_j(n)$$`
---
class: left, .toc[[✧](#toc)]
## SVD
Assuming $M 0 \cr
\end{eqnarray}
$$`
$M$ orthonormal *left singular vector* for $X$, and $M$ orthonormal *right singular vector* for $Y$, associated with $M$ *singular values*.
---
class: left, .toc[[✧](#toc)]
## SVD : Interpretation
A SVD analysis decomposes the cross covariance into modes, the first of which maximizes the cross covariance as `$\lambda^2_1/\sum_k \lambda^2_k$` with a spatial structure given by the pair of singular vectors `$\mathbf{U}_1$` and `$\mathbf{V}_1$`.
The second mode generates the second biggest fraction of cross covariance as `$\lambda^2_2/\sum_k\lambda^2_k$` with the spatial structure of the second pair of vectors `$\mathbf{U}_2$` and `$\mathbf{V}_2$` but with these ones orthogonal to the vectors of the first mode, and so forth.
The `$n$`-th ratio `$\lambda^2_n/\sum_k\lambda_k^2$` is called the *squared fraction covariance* of mode `$n$` and is usually expressed as a percentage. It is usually interpreted as the amount of the total cross (co)variance which is captured by each coupled mode.
---
class: left, .toc[[✧](#toc)]
## SVD : Principal Components
Associated with the singular vectors are the *Principal Components* time series, obtained by multiplying the data and their singular vectors.
`$\mathbf{A} = \mathbf{X}\mathbf{U}$` contains the principal components for the left field and
`$\mathbf{B} = \mathbf{Y}\mathbf{V}$` contains the principal components for the right field.
The left and right data matrix can now be written as the sums of $P$ modes:
`$$
\mathbf{X} = \mathbf{A}\mathbf{U}^\mathrm{T} = \sum_{k=1}^M \mathbf{A}_k \mathbf{U}_k^\mathrm{T}
$$`
`$$
\mathbf{Y} = \mathbf{B}\mathbf{V}^\mathrm{T} = \sum_{k=1}^M \mathbf{B}_k \mathbf{V}_k^\mathrm{T}
$$`
---
class: left, .toc[[✧](#toc)]
## SVD : Principal Components
It can be shown that
`$$\mathbf{A}^\mathrm{T}\mathbf{B} = \mathbf{\Lambda}$$`
that is
`$\mathbf{A}_k^\mathrm{T}\mathbf{B}_k = \sum_{n=1}^N A_k(n)B_k(n) = \lambda_k$`
for all $k$ but
`$\mathbf{A}_k^\mathrm{T}\mathbf{B}_j = \sum_{n=1}^N A_k(n)B_j(n) = 0$` for `$k \neq j$`.
This means that the PC time series of the left and right fields for mode $n$ covary but the PCs of different mode do not.
The strength of the coupling between the field is usually measured by the correlation coefficient
`$$r_n = \frac{\mathbf{A}_n^\mathrm{T}\mathbf{B}_n }{\sqrt{\mathbf{A}_n^\mathrm{T}\mathbf{A}_n \mathbf{B}_n^\mathrm{T}\mathbf{B}_n}}$$`
---
class: left, .toc[[✧](#toc)]
## Example: Monthly data in the Tropical Pacific
Here, we consider the de-seasoned monthly data
.left-column[SST from NOAA Optimum Interpolation (OI) SST V2
]
.right-column[10-m Zonal winds from NCEP-DOE Reanalysis 2
]
---
class: left, .toc[[✧](#toc)]
## Example: Monthly data in the Tropical Pacific
SVD1 showing rescaled singular vectors (shading), homogeneous correlation maps (contours), and normalized PC time series.
.center[]
---
class: left, .toc[[✧](#toc)]
## Example: Monthly data in the Tropical Pacific
SVD2
.center[]
---
class: left, .toc[[✧](#toc)]
## Example: Monthly data in the Tropical Pacific
SVD3
.center[]
---
name: eofsvd
class: center, middle, .toc[[✧](#toc)]
# 5. EOF analysis by SVD
---
class: left, .toc[[✧](#toc)]
Why is calculating the SVD of a data matrix the same as calculating the eigen decomposition of its associated autocovariance matrix?
The data matrix `$\mathbf{X}$` of dimension `$N \times P$` can be factored as
`$$
\mathbf{X} = \mathbf{P} \mathbf{L} \mathbf{Q}^\mathrm{T}
$$`
where `$\mathbf{P}$` is `$N \times N$`, `$\mathbf{L}$` is `$N \times P$`, and `$\mathbf{Q}$` is `$P \times P$`.
--
Now, the associated *sample auto-covariance matrix* is
`$$
\begin{split}
\widehat{\mathbf{C}}_{XX} & = \mathbf{X}^\mathrm{H} \mathbf{X}/(N-1) \\
& = (\mathbf{P} \mathbf{L} \mathbf{Q}^\mathrm{H})^\mathrm{H} (\mathbf{P} \mathbf{L} \mathbf{Q}^\mathrm{H}) /(N-1) \\
& = \mathbf{Q} \mathbf{L}^\mathrm{H} \mathbf{L} \mathbf{Q}^\mathrm{H}/(N-1).
\end{split}
$$`
--
Alternatively, `$\widehat{\mathbf{C}}_{XX}$` has an eigen decomposition, i.e.
`$$
\widehat{\mathbf{C}}_{XX} = \mathbf{U} \mathbf{\Gamma} \mathbf{U}^\mathrm{T}
$$`
where `$\mathbf{U}$` and `$\mathbf{\Gamma}$` are both `$P \times P$`
---
class: left, .toc[[✧](#toc)]
Since we have
`$$
\widehat{\mathbf{C}}_{XX} = \mathbf{U} \mathbf{\Gamma} \mathbf{U}^\mathrm{T} = \mathbf{Q} \mathbf{L}^\mathrm{H} \mathbf{L} \mathbf{Q}^\mathrm{H}/(N-1),
$$`
by identification we have:
`$$
\mathbf{U} \equiv \mathbf{Q} \quad \text{and}\quad \mathbf{\Gamma} \equiv \mathbf{L}^\mathrm{H} \mathbf{L}/(N-1)
$$`
--
As a consequence, for an EOF analysis there is actually no need to compute the auto covariance matrix to obtain the eigen vectors and the eigen values. It suffices to compute the SVD of the original data matrix:
`$$
\mathbf{X} = \mathbf{P} \mathbf{L} \mathbf{Q}^\mathrm{T} = \mathbf{A}\mathbf{U}^\mathrm{T}
$$`
The PC matrix is `$\mathbf{A} = \mathbf{P} \mathbf{L}$` and the eigen matrix is `$\mathbf{\Gamma} = \mathbf{L}^\mathrm{H} \mathbf{L}/(N-1)$`
---
class: middle, center, .toc[[✧](#toc)]
# THE END
Thank you!
email: selipot@rsmas.miami.edu