Statistics, Department of

The R Journal
Date of this Version
8-2016
Document Type
Article
Citation
The R Journal (August 2016) 8(1); Editor: Michael Lawrence
Abstract
In recent years, there has been increased interest in methods for gender prediction based on f irst names that employ various open data sources. These methods have applications from bibliometric studies to customizing commercial offers for web users. Analysis of gender disparities in science based on such methods are published in the most prestigious journals, although they could be improved by choosing the most suited prediction method with optimal parameters and performing validation studies using the best data source for a given purpose. There is also a need to monitor and report how well a given prediction method works in comparison to others. In this paper, the author recommends a set of tools (including one dedicated to gender prediction, the R package called genderizeR), data sources (including the genderize.io API), and metrics that could be fully reproduced and tested in order to choose the optimal approach suitable for different gender analyses.
Included in
Numerical Analysis and Scientific Computing Commons, Programming Languages and Compilers Commons
Comments
Copyright 2016, The R Foundation. Open access material. License: CC BY 3.0 Unported