Unit 1: Introduction to Machine Learning

Paradigms, workflow and data preprocessing

Learning Outcomes
  • Define machine learning and how it differs from classical programming
  • Distinguish supervised, unsupervised and reinforcement learning
  • Describe the end-to-end machine learning workflow
  • Split and scale data for modelling

What is Machine Learning

Machine learning studies algorithms that improve automatically from data instead of being explicitly programmed. A task is framed as learning a mapping from input features to an output target that generalises to unseen data.

Types of Learning

Supervised learning uses labelled examples for regression and classification, unsupervised learning discovers structure in unlabelled data through clustering and dimensionality reduction, and reinforcement learning learns a policy from rewards.

Workflow and Preprocessing

A project flows through data collection, cleaning, feature engineering, a train and test split, model training, evaluation and deployment. Numerical features are scaled and categorical features are encoded before training.

from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Summary

This unit defined machine learning, separated the three learning paradigms, and walked through the standard workflow with the preprocessing and data-splitting steps that precede modelling.

Exercises

  • Give two real-world examples each of classification and regression.
  • Explain why data is split into training and test sets.
  • List the steps of a typical machine learning project.
  • Describe one situation suited to reinforcement learning.