Principle Component Analysis

PCA aims to find linearly uncorrelated orthogonal axes, which are also known as principal components (PCs)in the m dimensional space to project the data points onto. The first PC captures the largest variance in the data and the second PC captures the second largest variance a direction which is orthogonal to the first.

Below is 2-D scatter plot and line which depicts a PC being fitted on a 2-D data matrix.

../_images/pca_image.gif

Making sense of PCA by fitting on a 2-D dataset (source)

Calculation of PCA

Here, we begin by explaining the usual PCA used in Quadratic Discriminant Function (QDF). This follows the introduction of that in [Mae10].

Let, \(x_i\) be the \(i^{th}\) vector in our target dataset with \(N\) samples, and \(\mu\) be the mean \(\frac{1}{N}\sum_{i}x_i\). \(\phi\) is the unit vector that represents the direction that maximizes the variance of the distribution.

Our goal is to capture the direction that maximizes the variance of the distribution, e.g. solve for \(\phi\) with the condition that \(|| \phi ||=1\). This can be formulated with the introduction of an unknown multiplier \(\lambda\) and follows:

\[\begin{split}J = \sum{i} (x_a - \mu, \phi)^2 - \lambda (||\phi||^2-1) \\ = \sum_{i}\{ \sum_{j}(x_{ij} - m_j), \phi \}^2 - \lambda (\sum_{j}||\phi||^2-1)\end{split}\]

We can solve for \(|| \phi ||\) by setting the partial derivatives of J to zero. This leaves us with:

\[\sum_{i}(x_i - \mu, \phi)(x_i - \mu) - \lambda (2\phi) = 0\]

This can be rewritten further as shown below,

\[\sum_{i}(x_i - \mu)(x_i - \mu)T\phi = \lambda (2\phi)\]

If we focus on \(\sum_{i}(x_i - \mu)(x_i - \mu)\), we can see this is a matrix. Replacing this matrix with \(K\) such that \(K=\sum_{i}(x_i - \mu)(x_i - \mu)\), we find a formula that many will find familiar.

\[K \phi = \lambda\phi\]

In conclusion, phi becomes the eigen vector of covariance matrix K. As for \(\phi(i>0)\), the story is the same and \(\phi\) are obtain as the eigenvectors.

../_images/pca.gif

Code can be found in the gallery.