====== Random forests ======
{{:en:iot-open:czapka_p.png?50|General audience classification icon}}{{:en:iot-open:czapka_b.png?50|General audience classification icon}}{{:en:iot-open:czapka_e.png?50|General audience classification icon}}\\
Random forests [[https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#intro|Random forests]] are among the best out-of-the-box methods highly valued by developers and data scientists. For better understanding of the process, an imaginary weather forecast problem might be considered, represented by the following true decision tree:
Now, one might consider several forecast agents – friends of neighbours- where each provides their own forecast depending on the factor values. Some will be higher than the actual value, and some lower. However, since they all use some **experience-based knowledge**, the forecast collected will be distributed around the actual value.
The Random forest (RF) method uses hundreds of forecast agents, decision trees, and then applies majority voting.
Some advantages:
* RF uses more knowledge than a single decision tree;
* Furthermore, the more diverse initial information sources have been used, the more diverse models will be and the more robust the final estimate is;
* This is true because a single data source might suffer from data anomalies reflected in model anomalies;
RF features:
* Each tree in the forest uses randomly selected subset of factors;
* Each tree has a randomly sampled subset of training data;
* However, each tree is trained like usual;
* This increases the independence of data anomalies;
* When a decision is made, it is a simple average of the whole forest;
Each tree in the forest is grown as follows:
* If the number of cases in the training set is N, a sample of N cases at random is taken - but with replacement, from the original data. Some samples will be represented more than once;
* This sample will be the training set for growing the tree.
* If there are M input factors, a number m<