Principal Component Analysis (page 3 of 5) |
The larger the variance carried by a line, the larger the dispersion of the observations along it and the more information the line has.
In the below figure, the line that matches the purple marks would be the first principal component because that is the line where the observations are the most spread out and maximizes the variance.
The second principal component is calculated in the same way, with the condition that it is uncorrelated with (i.e., perpendicular to) the first principal component.
This process continues until a total of "p" principal components (hyperparameter) have been calculated.