Abstract
Decision trees built with data remain in widespread use for nonparametric prediction. Predicting probability distributions is preferred over point predictions when uncertainty plays a prominent role in analysis and decision-making. We study modifying a tree to produce nonparametric predictive distributions. We find the standard method for building trees may not result in good predictive distributions and propose changing the splitting criteria for trees to one based on proper scoring rules. Analysis of both simulated data and several real datasets demonstrates that using these new splitting criteria results in trees with improved predictive properties considering the entire predictive distribution.
Supplementary Materials
The supplemental materials include A) List of the existing pre-pruning algorithms; B) Proofs of theorems; C) Statistical tests based on different scores; D) Additional plots with synthetic data for tree comparisons; E) Finding the true splits; and F) Real data descriptions and additional experiments.
Acknowledgments
The authors gratefully acknowledge the computing resources provided on Bebop, a high-performance computing cluster operated by the Laboratory Computing Resource Center at Argonne National Laboratory. The authors are also thankful to Dr. Timothy Williams for their help in acquiring and understanding one of the Yield dataset used in this study.
Disclosure Statement
The authors report there are no competing interests to declare.