Scientific article
OA Policy
English

How accurate are gender detection tools in predicting the gender for Chinese names? A study with 20,000 given names in Pinyin format

ContributorsSeboe, Paul
Published inJournal of the Medical Library Association, vol. 110, no. 2, p. 205-211
Errata
  • The PDF version of this article reflected an older version of the article with an incorrect URL for reference 17, while the HTML version was correct. The PDF has been updated to the correct version.
  • DOI : 10.5195/jmla.2022.1544
  • PMID : 35521042
Publication date2022-04-01
Abstract

Objective: We recently showed that the gender detection tools NamSor, Gender API, and Wiki-Gendersort accurately predicted the gender of individuals with Western given names. Here, we aimed to evaluate the performance of these tools with Chinese given names in Pinyin format.

Methods: We constructed two datasets for the purpose of the study. File #1 was created by randomly drawing 20,000 names from a gender-labeled database of 52,414 Chinese given names in Pinyin format. File #2, which contained 9,077 names, was created by removing from File #1 all unisex names that we were able to identify (i.e., those that were listed in the database as both male and female names). We recorded for both files the number of correct classifications (correct gender assigned to a name), misclassifications (wrong gender assigned to a name), and nonclassifications (no gender assigned). We then calculated the proportion of misclassifications and nonclassifications (errorCoded).

Results: For File #1, errorCoded was 53% for NamSor, 65% for Gender API, and 90% for Wiki-Gendersort. For File #2, errorCoded was 43% for NamSor, 66% for Gender API, and 94% for Wiki-Gendersort.

Conclusion: We found that all three gender detection tools inaccurately predicted the gender of individuals with Chinese given names in Pinyin format and therefore should not be used in this population.

Keywords
  • Chinese
  • Accuracy
  • Gender detection
  • Misclassification
  • Name
  • Name-to-gender
  • Performance
Citation (ISO format)
SEBOE, Paul. How accurate are gender detection tools in predicting the gender for Chinese names? A study with 20,000 given names in Pinyin format. In: Journal of the Medical Library Association, 2022, vol. 110, n° 2, p. 205–211. doi: 10.5195/jmla.2022.1289
Main files (1)
Article (Published version)
Secondary files (3)
Updates (1)
Erratum
Identifiers
ISSN of the journal1536-5050
145views
141downloads

Technical informations

Creation03/05/2022 05:21:00
First validation03/05/2022 05:21:00
Update time27/04/2023 08:22:49
Status update27/04/2023 08:22:49
Last indexation01/10/2024 21:31:17
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack