Repositori Digital de la UPF

Guies

Enviaments recents

No hi ha miniatura disponible

Invisible signals: detecting potential selection bias in AI-based resume screening

Automated resume screening tools are now widely used in hiring processes, offering the promise of efficiency and fairness by reducing human bias. Yet recent studies have shown that these systems can still behave unfairly by picking up on subtle linguistic clues that reveal sensitive personal information. This thesis explores whether transformer-based models can infer protected attributes (gender, perceived origin, religion, or sexual orientation) from resume text, even when this information isn’t stated directly. To investigate this, the study analyzes a real-world dataset of over 900 resumes. Each document was cleaned and its words categorized into semantic groups, such as occupation-related words, location-related, skill-related, and proper nouns. The main method used is a series of lexical ablation experiments: for each demographic attribute, twelve experiments were run by including or excluding different word categories. These were combined with a lexical shift analysis using Shifterator to identify which specific words most influenced the model’s predictions. The results show that models can reliably infer gender and perceived origin. Occupation related terms were mainly predictive of gender, while geographic references were almost direct cues in identifying perceived origin. However, attempts to predict religion and sexual orientation failed, likely due to limited language cues or imbalanced data. Interestingly, even individual words like gendered job titles (e.g., “waitress”) or places names were enough to act as unintended signals. These findings raise important concerns about fairness in algorithmic hiring. The fact that AI models can detect protected attributes even in anonymized resumes suggests that bias may persist through indirect linguistic patterns. This highlights the need for stronger audits, more transparent systems, and proactive strategies to reduce bias, such as masking certain word types or using debiasing techniques during training. It also calls for caution when relying on AI-driven tools in hiring. Overall, this thesis adds to the field of algorithmic fairness by presenting a practical framework to identify and understand hidden bias in resume screening. It shows that removing obvious identifiers is not enough; fairness also depends on understanding how language itself can reveal sensitive information.

(2025) Buyreu Real, Pau