Best Split Doesn’t Necessarily Produce the Best Decision Tree
Decision trees are popular as predictive models because of their intuitiveness and competitive performance with respect to other model building methodologies. A decision tree model is build using the data at hand, that is the training data, by successively splitting the data into purer and purer subsets in a top-down manner. The quality of any potential split of the data is measured by one of a handful of split quality measures such as the Gini index or the entropy measure. These or other similar measures essentially quantify the level of impurity in the resulting subsets of an split.
Just to get an idea of how the quality of a potential split is determined, take a look at the slide below which shows a set of possible splits for an illustrative example and how their quality is calculated using the entropy measure.
All decision tree modeling methods go for the best split at each stage of the model building process with the understanding that the resulting tree model will be better than the tree model that will result from not choosing the best split. Here I just want to show an example where this is not true. It is shown in the slide below where the worst split as per the entropy measure yields an inferior tree in terms of tree size and model clarity. The reason for this is the greedy nature of split selection criterion which doesn’t include any look-ahead component.
If you have come across any other example where a similar result is obtained, then please share that example with us.
Thanks.