Here’s a fun take on the OLS that I picked up from The Elements of Statistical Learning. It applies the Singular Value Decomposition, also known as the method used in principal component analysis, to the regression framework.

Singular Vector Decomposition (SVD)

First, a little background on the SVD. The SVD could be thought of as a generalisation of the eigendecomposition. An eigenvector v of matrix \(\mathbf{A}\) is a vector that is mapped to a scaled version of itself: \[ \mathbf{A}v = \lambda v \] where \(\lambda\) is known as the eigenvalue. For a full rank matrix (this guarantees orthorgonal eigenvectors), we can stack up the eigenvalues and eigenvectors (normalised) to obtain the following equation: \[ \begin{aligned} \mathbf{A}\mathbf{Q} &= \mathbf{Q}\Lambda \\ \mathbf{A} &= \mathbf{Q}\Lambda\mathbf{Q}^{-1} \end{aligned} \] where \(\mathbf{Q}\) is an orthonormal matrix.

For the SVD decomposition, \(\mathbf{A}\) can be any matrix (not square). The trick is to consider the square matrices \(\mathbf{A}'\mathbf{A}\) and \(\mathbf{A}\mathbf{A}'\). The SVD of the \(n \times k\) matrix \(\mathbf{A}\) is \(\mathbf{U}\mathbf{D}\mathbf{V}'\), where \(\mathbf{U}\) is a square matrix of dimension \(n\) and \(\mathbf{V}\) is a square matrix of dimension \(k\). This implies that \(\mathbf{A}'\mathbf{A} = \mathbf{V}\mathbf{D}^{2}\mathbf{V}\) and \(\mathbf{V}\) can be seen to be the eigenvalue matrix of that square matrix. Similarly, the eigenvectors of \(\mathbf{A}\mathbf{A}'\) forms the columns of \(\mathbf{U}\) while \(\mathbf{D}\) is the square root of the eigenvalues of either matrix.

In practice, there is no need to calculate the full set of eigenvectors for both matrices. Assuming that the rank of \(\mathbf{A}\) is k, i.e. it is a long matrix, there is no need to find all n eigenvectors of \(\mathbf{A}\mathbf{A}'\) since only the elements of the first k eigenvalues will be multiplied by non-zero elements. Hence, we can restrict \(\mathbf{U}\) to be a \(n \times k\) matrix and let \(\mathbf{D}\) be a \(k \times k\) matrix.

Here’s a paint illustration of the dimensions of the matrices produced by the SVD decomposition to illustrate the idea more clearly. The \(n \times k\) matrix produced by multiplying \(\mathbf{U}\) and \(\mathbf{D}\) is equivalent for both the blue (keeping all n eigenvectors) and red (keeping only the relevant k eigenvectors) boxes. svd

Applying the SVD to OLS

To apply the SVD to the OLS formula, we re-write the fitted values, substituting the data input matrix \(X\) with its equivalent decomposed matrices:

\[ \begin{aligned} \mathbf{X}\hat{\beta} &= \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y} \\ &= \mathbf{U}\mathbf{D}\mathbf{V}'(\mathbf{V}\mathbf{D}'\mathbf{U}'\mathbf{U}\mathbf{D}\mathbf{V}')^{-1}\mathbf{V}\mathbf{D}'\mathbf{U}'\mathbf{y} \\ &= \mathbf{U}\mathbf{D}(\mathbf{D}'\mathbf{D})^{-1}\mathbf{D}\mathbf{U}'\mathbf{y} \\ &= \mathbf{U}\mathbf{U}'\mathbf{y} \end{aligned} \] where the third to fourth line comes from the fact that \((\mathbf{D}'\mathbf{D})^{-1}\) is a \(k \times k\) matrix with the square root of the eigenvalues on the diagonal, and \(\mathbf{D}\) is a square diagonal matrix. Here we see that the fitted values are computed with respect to the orthonormal basis \(\mathbf{U}\).1

  1. Doing a QR decomposition will also give a similar set of results, though the orthogonal bases will be different.