Gender bias in Language and Vision datasets
and models has the potential to perpetuate
harmful stereotypes and discrimination. We
analyze gender bias in two Language and Vision datasets. Consistent with prior work, we
find that both datasets underrepresent women,
which promotes their invisibilization. Moreover, we hypothesize and find that a bias affects human naming choices for people playing
sports: speakers produce names indicating the
sport (e.g. ‘tennis player’ or ‘surfer’) more often ...
Gender bias in Language and Vision datasets
and models has the potential to perpetuate
harmful stereotypes and discrimination. We
analyze gender bias in two Language and Vision datasets. Consistent with prior work, we
find that both datasets underrepresent women,
which promotes their invisibilization. Moreover, we hypothesize and find that a bias affects human naming choices for people playing
sports: speakers produce names indicating the
sport (e.g. ‘tennis player’ or ‘surfer’) more often when it is a man or a boy participating in
the sport than when it is a woman or a girl, with
an average of 46% vs. 35% of sports-related
names for each gender. A computational model
trained on these naming data reproduces the
bias. We argue that both the data and the model
result in representational harm against women.
+