Abstract
This study develops a hidden Markov model (HMM)-based clustering framework to predict auto insurance losses using driving characteristics extracted from telematics data. Through a simulation experiment based on a proprietary telematics dataset, we show that HMM can effectively classify driving trips using model-implied hidden states, and HMM-based pricing methods provide better predictive power measured by deviance statistics. Importantly, the proposed framework not only enables us to price usage-based insurances at a granular level but is also viable for estimating long-term insurance losses utilizing the limiting properties of HMM.
ACKNOWLEDGMENTS
The authors thank two anonymous referees for their valuable suggestions that improved the article.
Disclosure Statement
No potential conflict of interest was reported by the author(s).
Notes
1 An observation is defined as per second record of driving behavior variables.
2 In the United States, residential areas generally have speed limits ranging from 25 to 40 mph, whereas interstate roads have speed limits ranging from 50 to 80 mph. See https://www.uproad.com/blog/speed-limits-in-the-usa
5 There are multiple ways to select the optimal number of clusters. For example, the elbow method is a popular one, in which the explained variation is plotted as a function of the number of clusters, and the elbow of the curve is picked as the optimal number of clusters (e.g., Ketchen and Shook Citation1996; Bholowalia and Kumar Citation2014).
6 Though the variables and cluster represent the information of the ith unit and depend on i, we omit the subscript for the unit on the right side of the equation for convenience. The same applies to EquationEquations (6)–Equation(8)(8) (8) .
7 Traditional rating factors may be flexibly added into the Poisson GLM models when both telematic and nontelematic effects are considered. In this case, EquationEquation (5)(5) (5) can be extended to
where and represent the vectors of driving behavior and traditional rating factors for the ith unit, respectively.
8 Although both clusters and hidden states are information extracted from the same dataset, we find very weak multicollinearity between these two variables. The variance inflation factor between these two variables is 1.79, which indicates a weak correlation (a variance inflation factor between 1 and 5 implies a weak association; James et al. Citation2021). We also employed the Cramér’s V test (Cramér Citation1946), and the Cramér’s V value between the variables is 0.142, which again implies a weak association.