Demographers often analyze the determinants of life-course events with parametric
regression-type approaches. Here, we present a class of nonparametric approaches, broadly
defined as machine learning (ML) techniques, and discuss advantages and disadvantages of a
popular type known as random forest. We argue that random forests can be useful either as a
substitute, or a complement, to more standard parametric regression modeling. Our discussion
of random forests is intuitive and we illustrate ...
Demographers often analyze the determinants of life-course events with parametric
regression-type approaches. Here, we present a class of nonparametric approaches, broadly
defined as machine learning (ML) techniques, and discuss advantages and disadvantages of a
popular type known as random forest. We argue that random forests can be useful either as a
substitute, or a complement, to more standard parametric regression modeling. Our discussion
of random forests is intuitive and we illustrate its implementation by analyzing the
determinants of divorce with SOEP data for German women entered in a marriage or a
cohabitation from 1984 to 2015. The algorithm is able to classify divorce determinants
according to their importance, highlighting the most powerful ones, which in our data are
partners' overall life satisfaction, their age, and also certain personality traits (i.e.,
extroversion of the partner and – though with less power – also women's conscientiousness,
agreeableness and openness). We are also able to draw partial dependence plots for the main
predictors of survival of the relationship.
+