Member-only story

Tips on Principal Component Analysis

How to select the number of principal components and application of PCA to new observations

Nicolo Cosimo Albanese
8 min readSep 6, 2020
Photo by Volodymyr Hryshchenko on Unsplash

Introduction

Principal Component Analysis (PCA) is an unsupervised technique for dimensionality reduction.

What is dimensionality reduction?

Let us start with an example. In a tabular data set, each column would represent a feature, or dimension. It is commonly known that it is difficult to manipulate a tabular data set that has a lot of columns/features, especially if there are more columns than observations.

Given a linearly modelable problem having a number of features p=40, then the best subset approach would fit a about trillion (2^p-1) possible models and submodels, making their computation extremely onerous.

How does PCA come to aid?

PCA can extract information from a high-dimensional space (i.e., a tabular data set with many columns) by projecting it onto a lower-dimensional subspace. The idea is that the projection space will have dimensions, named principal components, that will explain the majority of the variation of the original data set.

How does PCA work exactly?

--

--

Nicolo Cosimo Albanese
Nicolo Cosimo Albanese

No responses yet