FixMyStreet is an online issue reporting website that allows users to report issues in their area to the relevant level of government. This paper uses user-declared gender and automatically-derived gender for anonymous FixMyStreet users to validate the finding in Solymosi, Bowers and Fujiyama (2017) that different kinds of reports in FixMyStreet.com have differently gendered reporting patterns. It investigates patterns of anonymization and the accuracy of automatically-derived gender and confirms the pattern of gendered reporting (where driving related reports are more likely to be made by men and walking related reports more likely to be made by women), as well as identifying a gender difference in user preference for anonymity.
This analysis is a short replication of the Solymosi, Bowers and Fujiyama (2017) analysis of FixMyStreet – a 311-like web platform built by mySociety that runs in a variety of jurisdictions around the world. This platform allows users to make a report on a map, which based on location and type will be routed to the relevant authority. FixMyStreet.com is the UK implementation of this platform and has had over 1.4 million problems reported since 2007.
The original paper used scraped data from FixMyStreet.com to build a dataset of issues reported. Where the name of the reporter was available on the public website, a gender recognition process was used to derive gender from first name information.
This revealed a number of interesting findings about the distribution in space and time of FixMyStreet reporting. Of key interest is the finding that certain categories of reports were gendered - with some such as potholes, being more likely to be reported by men and others, such as litter, being more likely to be reported by women.
The authors suggest a possible theory based on where people are encountering problems to explain this difference, where:
[M]en are more likely to report in categories related to driving (potholes and road problems), whereas women report more in categories related to walking (parks, dead animals, dog fouling, litter.(p. 954).
However, there are two possible problems with the methodology used to reach these results. The first potential issue is the unknown accuracy of deriving gender from name on this dataset. The second issue is that if men and women are choosing to be anonymous on the website at different rates, this will affect the kinds of reports that are assigned to different genders in the analysis.
This paper examines these concerns using internal FixMyStreet data unavailable to the original researchers - with the conclusion that the finding of gendered categorization is robust against these difficulties with the method.
This study uses gender automatically derived from both anonymous and non-anonymous FixMyStreet users to examine if there is a difference in the rate of anonymization between genders. This uses the Python package 'gender_detector' with the 'uk' dataset - with an additional function to catch uses of gendered titles ('Mr', 'Ms'), or to use the second 'word' of the name where a non-gendered title is detected.
Using Nyanzu 2018)'s B category system for FixMyStreet reports (which groups FixMyStreet reports into 29 meta-categories) this study then uses a chi square test to check if certain categories are more likely to be reported by users who also choose to be anonymous.
This study will also validate the accuracy of automatic assignment of gender based on data collected by an internal experiment that asked users to disclosure gender. A variant of the home page was shown at random to visitors, and when making a report users were (optionally) asked their gender when making a report. This data collection feature was continued for a number of months after the experiment - meaning gender information was collected for 41,300 reports made between April 2016 to March 2017 - the majority of which were not shown the variant design and had a consistent user experience. Using this self-declared data, this paper will examine the accuracy and false-positive rate of automatic-gender detection.
Using both the wider set of derived gender and the results of declared gender - a chi square analysis was run to validate the finding of gendered categories. While the original experiment on FixMyStreet.com that gathered the gender data found no significant difference between the original and variant home pages, this analysis will only use data collected not using the ‘variant' homepage for consistency (36,931).
Privacy and ethics issues
This research made use of internal access to data not publicly available concerning the names of users who have chosen to be anonymous on FixMyStreet.com. This information was converted to derived gender immediately after extraction from the database and no personal information was stored by the researchers.
There is a more general question about the validity of using derived gender information to draw conclusions about gendered behaviours. Just as it would not be appropriate (or necessarily accurate) to record gender based on an observation of a person, it is similarly difficult to derive gender from name alone without explicitly asking for this information. This research proceeds on the basis that while it would be inappropriate to take action on an individual level on the basis of derived gender because of the risk of inaccuracy - for the purposes of aggregate analysis it is an invaluable tool in allowing demographics of datasets to be examined where gender information was not or could not be explicitly recorded.
Table 1 shows the accuracy when comparing self-declared to automatically derived gender:
|self-declared gender||male from name||female from name||unknown||correct %||false positive %|
This shows that categorization is broadly accurate for this dataset for men and women (90% accurate for men and 80% for women) - but with it being more likely to miscategorize women as men than vice-versa.
Table 2 examines if men and women decide to report anonymously at different rates. This shows that women choose to be anonymous at a rate 10% higher than men (validated with an independent t-test, unequal variance).
Table 3 presents results from a chi-square test - X2 (28, n = 923,809) == 8279.70, p < 0.0001 - to examine if different categories show different rates of anonymization – only showing categories where the standardized residuals were greater than 2 or less than -2.
8 out of 29 categories had standardized residuals greater than 2 2or less than -2.. This finds that categories such as Parking, Abandoned Vehicles and Overgrown Vegetation are more likely to be reported anonymously, while categories such as Road Surface Defects, Pavement/Footway Defects and Street Lights are less likely to be reported anonymously.
|Men%||Women %||Mean Difference||T score||p-value|
0.61 [0.61, 0.61]
0.71 [0.71, 0.71]
0.10 [0.10, 0.10]
Right of Way
Pavement /Footway Defects
Road Surface Defects
Using a chi-square test to examine reports by derived gender - X2 (28, n = 923,809) == 14893, p < 0.0001 - Table 4 show where the standardised residuals were higher than 2 or less than -2, finding 6/29 categories that were more likely to be made by men and 12/29 that were more likely to be made by women, using derived data (overall p-value <0.0001). Table 5 shows the same for self-declared gender data - X2 (26, n = 36,931) = 488.58, p < 0.0001 - but with fewer categories of report reaching the threshold for inclusion. This shows that Road Surface Defects, Highways Enquiries and Road Safety are more likely to be reported by men, while Overgrown Vegetation, Dog Fouling and Rubbish are more likely to be reported by women.
Road Surface Defects
Pavement /Footway Defects
Right of Way
Replication using self-declared data (only the 'original' variant)
Road Surface Defects
The finding of gendered reporting in Solymosi, Bowers and Fujiyama (2017) is validated. This provides further evidence that women are more likely to make reports related to walking (e.g. Rubbish, Overgrown vegetation, Dog fouling) while men are more likely to make reports related to roads (Potholes, Road safety).
The accuracy of derived gender is high when tested against declared gender (98% accuracy for women and 96% for men). While the method is unable to assign a gender for a sizeable minority of users - it can be used to accurately examine the behaviour of those it has identified.
There is a gender difference in deciding to make an anonymous report (women are approximately 10% more likely to make an anonymous report) and this will affect analysis of gender in non-anonymous reports (as certain kinds of reports are more likely to made anonymously). However, this does not seem to interfere with the finding of gendered reporting as the categories identified broadly follow the same pattern as the original paper.
Using gender derived from anonymous and non-anonymous reports, there are a range of categories that show gender differentials. The general pattern of road reports being more likely to be made by men and 'walking' reports as being more likely to be made by women is validated. While the relative lack of data for self-declared gender leads to fewer categories that stand out as statistically significant, the same pattern can be seen in that dataset.
This analysis has found that gender differences in FixMyStreet reporting are robust against methodological issues of working with the partial non-anonymised FixMyStreet dataset. It has also validated the use of automatically derived gender in FixMyStreet data and identified where anonymous data has to be handled with caution.
Future analysis of data from FixMyStreet (or similar 311 services) that only includes identity information for those who have not opted out of sharing information publicly should be aware that this may be introducing an additional skew to the data. The accuracy of automatically derived gender data means that more subtle statistical effects can be investigated than those revealed through user surveys - but this is also a limited finding in that it is dependent on a well-developed corpus of gendered names, which would need to be re-validated for corpuses using primarily non-English language names.
Thanks to Reka Solymosi for comments on a draft on this paper.