Scientific article
Open access

Using genderize.io to infer the gender of first names: how to improve the accuracy of the inference

ContributorsSeboe, Paul
Published inJournal of the Medical Library Association, vol. 109, no. 4, p. 609-612
Publication date2021-11-22
First online date2021-11-22

Objective: We recently showed that genderize.io is not a sufficiently powerful gender detection tool due to a large number of nonclassifications. In the present study, we aimed to assess whether the accuracy of inference by genderize.io can be improved by manipulating the first names in the database. Methods: We used a database containing the first names, surnames, and gender of 6,131 physicians practicing in a multicultural country (Switzerland). We uploaded the original CSV file (file #1), the file obtained after removing all diacritic marks, such as accents and cedilla (file #2), and the file obtained after removing all diacritic marks and retaining only the first term of the compound first names (file #3). For each file, we computed three performance metrics: proportion of misclassifications (errorCodedWithoutNA), proportion of nonclassifications (naCoded), and proportion of misclassifications and nonclassifications (errorCoded). Results: naCoded, which was high for file #1 (16.4%), was reduced after data manipulation (file #2: 11.7%, file #3: 0.4%). As the increase in the number of misclassifications was small, the overall performance of genderize.io (i.e., errorCoded) improved, especially for file #3 (file #1: 17.7%, file #2: 13.0%, and file #3: 2.3%). Conclusions: A relatively simple manipulation of the data improved the accuracy of gender inference by genderize.io. We recommend using genderize.io only with files that were modified in this way.

  • Accuracy
  • Gender determination
  • Genderize.io
  • Misclassification
  • Name
  • Name-to-gender
  • Performance
Citation (ISO format)
SEBOE, Paul. Using genderize.io to infer the gender of first names: how to improve the accuracy of the inference. In: Journal of the Medical Library Association, 2021, vol. 109, n° 4, p. 609–612. doi: 10.5195/jmla.2021.1252
Main files (1)
Article (Published version)
Secondary files (1)
ISSN of the journal1536-5050

Technical informations

Creation11/24/2021 9:31:00 AM
First validation11/24/2021 9:31:00 AM
Update time03/16/2023 2:11:28 AM
Status update03/16/2023 2:11:27 AM
Last indexation02/12/2024 12:17:22 PM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack