Unit 2: Supervised Learning: Regression

Linear and logistic regression with evaluation

Learning Outcomes
  • Explain linear and logistic regression
  • Fit regression models with scikit-learn
  • Interpret coefficients and predictions
  • Evaluate models with suitable metrics

Linear Regression

Linear regression models a continuous target as a weighted sum of the input features, with weights chosen to minimise the mean squared error between predictions and actual values.

from sklearn.linear_model import LinearRegression model = LinearRegression().fit(X_train, y_train) pred = model.predict(X_test)

Logistic Regression

Despite its name, logistic regression is a classification method that passes a linear combination of features through the sigmoid function to output a class probability between zero and one.

from sklearn.linear_model import LogisticRegression clf = LogisticRegression().fit(X_train, y_train)

Evaluation Metrics

Regression quality is measured with MSE, RMSE and the R squared score, while classification uses accuracy, precision, recall and the F1 score, chosen according to the problem and class balance.

from sklearn.metrics import r2_score print(r2_score(y_test, pred))

Summary

This unit introduced regression for continuous targets and logistic regression for classification, together with the metrics used to judge regression and classification performance.

Exercises

  • Fit a linear regression model and report its R squared score.
  • Explain why mean squared error penalises large errors heavily.
  • State when precision matters more than recall.
  • Interpret the coefficients of a fitted linear model.