Random Forest Explained
Random Forest is a machine learning algorithm that combines the results of multiple decision trees to make accurate and reliable predictions.
Understanding Random Forest
Random Forest is a machine learning method that makes predictions by combining the results of many smaller decision tree models.
Instead of using just one decision tree (which can be too confident or wrong), random forest builds lots of trees, each one trained on different parts of the data.
Each tree looks at a random sample of the data (not the whole dataset).
When a tree makes a decision (e.g., which feature to split on), it only looks at a random subset of features. This keeps each tree different.
Making a prediction:
For classification tasks: The trees vote on a class, and the most popular answer wins.
For regression tasks (e.g., predicting a number) : The trees’ answers are averaged.
An Example of Random Forest
We are aiming to predict whether a passenger survived the Titanic disaster using information like their age, gender, ticket class, and fare paid, using a Random Forest classifier. This is a classification task, where the goal is to assign each passenger to one of two categories: survived or did not survive.
The random forest builds many trees (say, 100), and each one:
Looks at a random sample of passengers.
Considers only a subset of features like age, sex, fare, etc.
Learns its own rules for predicting survival.
Each tree in the forest might pick up on different patterns like:
“Females in 1st or 2nd class had high survival rates.”
“Males in 3rd class under age 10 sometimes survived.”
“Higher fare might signal 1st class → better chances.”
When asked to predict survival for a new passenger, all trees vote:
Each tree gives a 1 (survived) or 0 (didn’t survive).
The final prediction is the majority vote.
So if 80 out of 100 trees say “survived,” the forest predicts survived.
Connect with Me!
LinkedIn: caroline-rennier
Email: caroline@rennier.com