Trees vs. Neural Network

Posted on March 24, 2023

Neville Dubash headshot

Tree-based algorithms or neural networks for time series data

Neural networks and tree-based models are often compared when it comes to modeling data that have nonlinear relationships between variables. Both methods can handle complex interactions between variables in the data.

A common misconception is that a neural network is the best solution for data science problem. This belief is founded on the complexity of neural networks. Often tree-based methods do not receive the same attention, mostly due to their simplicity.

However, it has been demonstrated that tree-based algorithms often outperform neural networks for tabular data. The main reason is that neural networks can have difficulty learning the best-fit function when there is a non-smooth decision boundary due to its use of gradients in the training process.

Tree-based algorithms have lower tuning costs when there is an irregular pattern to learn. Studies have also shown that neural networks are not robust to uninformative features, which indicates that feature selection is a critical step in the preprocessing of neural networks. The neural network may therefore demonstrate the same performance as a tree, but it is associated with a higher computational cost.

Furthermore, trees use binary decisions (ones and zeros) instead of probabilities, they are suitable for cases where complex aspects of probability are not required. Deterministic modeling can be more natural than probabilistic modeling in many real-world cases.

As an example, a decision tree could be used to identify anomalous behaviour in the electricity consumption of a building. The decision-making path may appear as follows:

  • Is it a school?
  • Is it a weekend or holiday?
  • Is it winter?
  • Does the meter reading’s value differ by more than 5% from last day’s value?
  • Does the meter reading indicate an increase of more than 10% from the average of the last 24 hours?

Tree-based algorithms offer many benefits including lower parameter turning, less pre-processing effort, and higher interpretability due to the process of making a deterministic decision.