====== Random Forests ====== Random forests ((https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#intro|Random forests)) are among the best out-of-the-box methods highly valued by developers and data scientists. For a better understanding of the process, an imaginary weather forecast problem might be considered, represented by the following true decision tree (figure {{ref>Weatherforecastexample}}):

{{ :en:iot-reloaded:classification_6.png?800 | Weather Forecast Example}} Weather Forecast Example Now, one might consider several forecast agents – friends of neighbours - where each provides their forecast depending on the factor values. Some will be higher than the actual value, and some will be lower. However, since they all use some **experience-based knowledge**, the forecast collected will be distributed around the exact value. The Random forest (RF) method uses hundreds of forecast agents and decision trees and then applies majority voting (figure {{ref>Weatherforecastvotingexample}}).

{{ :en:iot-reloaded:classification_7.png?800 | Weather Forecast Voting Example}} Weather Forecast Voting Example Some advantages: * RF uses more knowledge than a single decision tree. * Furthermore, the more diverse the initial information sources used, the more diverse the models will be and the more robust the final estimate. * This is true because a single data source might suffer from data anomalies reflected in model anomalies. RF features: * Each tree in the forest uses a randomly selected subset of factors. * Each tree has a randomly sampled subset of training data. * However, each tree is trained like usual. * This increases the independence of data anomalies. * When a decision is made, it is a simple average of the whole forest. Each tree in the forest is grown as follows: * If the number of cases in the training set is N, a sample of N cases at random is taken - but with replacement, from the original data. Some samples will be represented more than once. * This sample will be the training set for growing the tree. * If there are M input factors, a number m<