Unit 3: Supervised Learning: Classification

Learning Outcomes

Apply KNN, decision trees and Naive Bayes
Reduce variance with random forests
Use support vector machines and kernels
Detect and limit overfitting

KNN and Naive Bayes

K-nearest neighbours classifies a point by the majority vote of its closest training points, while Naive Bayes applies Bayes theorem under the assumption that features are conditionally independent.

from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5).fit(X_train, y_train)

Decision Trees and Random Forests

A decision tree splits features using information gain or the Gini index, and a random forest averages many such trees trained on random subsets to reduce variance and improve generalisation.

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=100).fit(X_train, y_train)

Support Vector Machines

A support vector machine finds the hyperplane that maximises the margin between classes, and kernel functions let it separate data that is not linearly separable.

from sklearn.svm import SVC
clf = SVC(kernel="rbf").fit(X_train, y_train)

Summary

This unit surveyed the main classification algorithms from simple neighbour and probabilistic methods to tree ensembles and margin-based support vector machines, with attention to overfitting.

Exercises

Train a KNN classifier and study the effect of changing k.
Explain how a decision tree chooses a split.
Describe how a random forest reduces overfitting.
State the role of the kernel in a support vector machine.

← Previous Course Home Next →