Investigating Markers and Drivers of Gender Bias in Machine Translations

Barclay, Peter; Sami, Ashkan

doi:10.1109/SANER60148.2024.00054

Investigating Markers and Drivers of Gender Bias in Machine Translations

Barclay, Peter; Sami, Ashkan

Authors

Dr Peter Barclay P.Barclay@napier.ac.uk
Lecturer

Prof Ashkan Sami A.Sami@napier.ac.uk
Professor

Abstract

Implicit gender bias in Large Language Models (LLMs) is a well-documented problem, and implications of gender introduced into automatic translations can perpetuate real-world biases. However, some LLMs use heuristics or post-processing to mask such bias, making investigation difficult. Here, we examine bias in LLMs via back-translation, using the DeepL translation API to investigate the bias evinced when repeatedly translating a set of 56 Software Engineering tasks used in a previous study. Each statement starts with ‘she’, and is translated first into a ‘genderless’ intermediate language then back into English; we then examine pronoun-choice in the back-translated texts. We expand prior research in the following ways: (1) by comparing results across five intermediate languages, namely Finnish, Indonesian, Estonian, Turkish and Hungarian; (2) by proposing a novel metric for assessing the variation in gender implied in the repeated translations, avoiding the over-interpretation of individual pronouns, apparent in earlier work; (3) by investigating sentence features that drive bias; (4) and by comparing results from three time-lapsed datasets to establish the reproducibility of the approach. We found that some languages display similar patterns of pronoun use, falling into three loose groups, but that patterns vary between groups; this underlines the need to work with multiple languages. We also identify the main verb appearing in a sentence as a likely significant driver of implied gender in the translations. Moreover, we see a good level of replicability in the results, and establish that our variation metric proves robust despite an obvious change in the behaviour of the DeepL translation API during the course of the study. These results show that the back-translation method can provide further insights into bias in language models.

Citation

Barclay, P., & Sami, A. (2024, March). Investigating Markers and Drivers of Gender Bias in Machine Translations. Presented at IEEE International Conference on Software Analysis, Evolution and Reengineering, Rovaniemi, Finland

Presentation Conference Type	Conference Paper (published)
Conference Name	IEEE International Conference on Software Analysis, Evolution and Reengineering
Start Date	Mar 12, 2024
End Date	Mar 15, 2024
Acceptance Date	Dec 16, 2023
Online Publication Date	Jul 16, 2024
Publication Date	2024
Deposit Date	Apr 30, 2024
Publicly Available Date	Jul 16, 2024
Publisher	Institute of Electrical and Electronics Engineers
Peer Reviewed	Peer Reviewed
Pages	455-464
Series ISSN	2640-7574
Book Title	2024 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)
ISBN	9798350330670
DOI	https://doi.org/10.1109/SANER60148.2024.00054
Keywords	back-translation, machine translation, large lan- guage model, gender bias
Public URL	http://researchrepository.napier.ac.uk/Output/3608153
Publisher URL	https://conf.researchr.org/home/saner-2024

Files

Investigating Markers And Drivers Of Gender Bias In Machine Translations (accepted version) (219 Kb)
PDF

Teallach — a flexible user-interface development environment for object database applications (2003)
Journal Article

A problem in querying recursive patterns with OQL (2002)
Preprint / Working Paper

Interoperable Services for Federations of Database System (2002)
Presentation / Conference Contribution

A dual-level presentation model for developing user-interfaces. (2000)
Presentation / Conference Contribution

The Prometheus database for taxonomy (2000)
Presentation / Conference Contribution

Downloadable Citations

HTML

BIB

RTF

Authors

Abstract

Citation

Files

You might also like

Downloadable Citations