Machine-learning techniques for family demography: an application of random forests to the analysis of divorce determinants in Germany

Enllaç permanent

Descripció

  • Resum

    Demographers often analyze the determinants of life-course events with parametric regression-type approaches. Here, we present a class of nonparametric approaches, broadly defined as machine learning (ML) techniques, and discuss advantages and disadvantages of a popular type known as random forest. We argue that random forests can be useful either as a substitute, or a complement, to more standard parametric regression modeling. Our discussion of random forests is intuitive and we illustrate its implementation by analyzing the determinants of divorce with SOEP data for German women entered in a marriage or a cohabitation from 1984 to 2015. The algorithm is able to classify divorce determinants according to their importance, highlighting the most powerful ones, which in our data are partners' overall life satisfaction, their age, and also certain personality traits (i.e., extroversion of the partner and – though with less power – also women's conscientiousness, agreeableness and openness). We are also able to draw partial dependence plots for the main predictors of survival of the relationship.
  • Mostra el registre complet