Clustering and dimensionality reduction
K-means partitions data into k clusters by iteratively assigning points to the nearest centroid and updating centroids to minimise within-cluster variance. The elbow method helps select a good value of k.
Hierarchical clustering builds a tree called a dendrogram by repeatedly merging the closest clusters, allowing the number of clusters to be chosen afterwards by cutting the tree at a chosen level.
PCA projects data onto the orthogonal directions of greatest variance, reducing the number of dimensions while retaining most of the information, which aids visualisation and speeds up learning.
This unit covered clustering with k-means and hierarchical methods and dimensionality reduction with PCA, the main tools for finding structure in unlabelled data.