Supervised Learning
What supervised learning means, how to build a model, and the most common types you’ll come across.
What is Supervised Learning?
The goal is for the model to learn a relationship or mapping from inputs to outputs, so it can accurately predict the label when given new, unseen inputs.
For example, if you’re building a model to predict whether someone will buy a product, the training data might include previous user interaction data, including features like:
Age
Gender
Income browsing history
Along with a label showing whether that person actually made a purchase
Once trained, the model can be given a new user (with their age, income, and browsing behavior), and it will output a prediction on how likely that user is to buy the product.
Basic Steps for Creating a Supervised ML Model:
Data Cleansing and Preparation
Start by handling missing values, removing errors or outliers, and transforming raw inputs into useful features. (See the Data Preprocessing page for details.)
Split your data into training and testing sets
Create a data partition that uses 70%-80% of the data for training and 20-30% for testing.
Even better, use cross validation to get a stronger sense of how your model generalizes on the whole dataset.
Train your chosen model on the training data
Choose a suitable supervised learning algorithm and fit it to your training data so it can learn the relationships between the different features and labels.
Evaluate the Model on the Testing Data
Use your trained model to make predictions on the test set and evaluate its performance using metrics like accuracy, AUC, or F1 score. (For more on this, see the Model Evaluation page.)
Tune and Improve the Model
To improve your model, you can tweak its hyperparameters, try out different types of models, or remove features that aren’t helpful.
The goal is to ensure your model performs well on new, unseen data—not just the training data—so it doesn’t overfit or become too tailored to the specific patterns in the training set
Types of Supervised Learning Models
Learn how models use labeled data to make accurate predictions in various applications.
A classification algorithm that estimates the probability a data point belongs to a class.


A regression algorithm that predicts a continuous value by fitting the best-fit line through the data.
A predictive model that repeatedly splits data by feature rules, forming branches that lead to a final decision.
Decision Trees






A method that blends many decision trees, and averages their results for stronger accuracy.


A method that classifies new data by matching it with the most similar examples in the dataset.


A method that separates classes by finding the widest possible boundary between them.


A technique that builds many small decision trees in sequence, each correcting the last one’s errors.


A method that classifies data by combining each feature’s probability, to choose the most likely class.
Connect with Me!
LinkedIn: caroline-rennier
Email: caroline@rennier.com