====== Random Forests ======
Random forests ((https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#intro|Random forests)) are among the best out-of-the-box methods highly valued by developers and data scientists. For a better understanding of the process, an imaginary weather forecast problem might be considered, represented by the following true decision tree (figure {{ref>Weatherforecastexample}}):
{{ :en:iot-reloaded:classification_6.png?800 | Weather Forecast Example}}
Weather Forecast Example
Now, one might consider several forecast agents – friends of neighbours - where each provides their forecast depending on the factor values. Some will be higher than the actual value, and some will be lower. However, since they all use some **experience-based knowledge**, the forecast collected will be distributed around the exact value.
The Random forest (RF) method uses hundreds of forecast agents and decision trees and then applies majority voting (figure {{ref>Weatherforecastvotingexample}}).
{{ :en:iot-reloaded:classification_7.png?800 | Weather Forecast Voting Example}}
Weather Forecast Voting Example
Some advantages:
* RF uses more knowledge than a single decision tree.
* Furthermore, the more diverse the initial information sources used, the more diverse the models will be and the more robust the final estimate.
* This is true because a single data source might suffer from data anomalies reflected in model anomalies.
RF features:
* Each tree in the forest uses a randomly selected subset of factors.
* Each tree has a randomly sampled subset of training data.
* However, each tree is trained like usual.
* This increases the independence of data anomalies.
* When a decision is made, it is a simple average of the whole forest.
Each tree in the forest is grown as follows:
* If the number of cases in the training set is N, a sample of N cases at random is taken - but with replacement, from the original data. Some samples will be represented more than once.
* This sample will be the training set for growing the tree.
* If there are M input factors, a number m<