Super contributors and power laws

This blog post is part of a series investigating different demographics and uses of mySociety services. You can read more about this series here

A common feature in websites and services where users generate data is that a small amount of users are responsible for a large percent of the activity. For instance, 77% of Wikipedia is written by 1% of editors (with most of that being done by an even smaller fraction) and for OpenStreetMap 0.01% of users contribute a majority of the information.

This also applies to plenty of offline activities — for instance, half of the 25,000 noise complaints about Heathrow Airport were made by 10 people. People who dedicate significant time to an activity can quickly outpace a much larger group who only use the service once.

For FixMyStreet (where people report issues like littering and potholes to local authorities), the top 0.1% of users made 16% of the reports and 10% of users account for 62% of reports. Starting from the most prolific users, increasing the number of users by a factor of 10 roughly doubles the number of reports:

  • 418 users (0.1%) account for 224,775 reports (16%)
  • 4,181 users (1%) account for 470,384 reports (33%)
  • 41,814 users (10%) account for 881,481 reports (62%)

This reflects that at any scale in the data, around half the activity is happening in the top 10%. Overall, two-thirds of users made only one report — but the reports made by this large set of users only makes up 20% of the total number of reports.

This means that different questions can lead you to very different conclusions about the service. If you’re interested in the people who are using FixMyStreet, that two-thirds is where most of the action is. If you’re interested in the outcomes of the service, this is mostly due to a much smaller group of people.

Reka Solymosi (2018) investigated the behaviour of the top 1% of reporters and found that they tended to report a wide range of categories: only “16 of the 415 contributors reported only one type of issue. The other 399 reported issues in more than one category” with an average of six categories. These also tended to cover a wide area and “there were only six people who reported in only one neighborhood [LSOA], fewer than the number of people who reported in only one category. The other 409 contributors all reported in at least two neighborhoods”. Solymosi finds four clusters of these super-contributors:

  • Traditional guardians – these report in a small number of neighbourhoods covered but represent the largest number of users.
  • Large-neighbourhood guardians – Report in a larger number of connected neighbourhoods.
  • Super-neighbourhood guardians – People who report in a high number of connected neighbourhoods; this is the largest group.
  • Neighbourhood agnostic guardians – reports are made in disconnected areas.

Collectively, this can have a wide impact — 18% of LSOAs in England have at least one report from a user who has made more than 100 reports (which is only around 900 people).

Looking at the general picture through the Explorer minisite, it’s not just that serial reporters report widely; certain kinds of reports are more likely to be made by users who are reporting more issues:

Incivilities, rubbish, road safety and bus stop damage are all categories more likely to be reported by users who have made 50+ reports. While users who make lots of reports tend to make reports across a few categories, they are often specialised in their output.

59% reports of flyposting, 57% of graffiti, 52% of litter problems are made by users who have reported more than 50 times.

It’s important to remember that these aren’t hard divides. Single report users are less likely to report potholes than serial reporters, but it is also true that one in five people who only report one issue report a pothole.

For the bundle model of understanding FixMyStreet, thinking about this group of super contributors is important, because they represent a minority of users, yet generate most of the value and impact of the site.

But this comes with a cost. People living in the same area as super contributors benefit from their efforts – but where these super contributors have different concerns or priorities from the area as a while this might shift the outcomes of the service.

As Muki Haklay argues:

The specific background and interests of high contributors will, by necessity, impact on the type of data that is recorded. This is especially important in VGI [volunteered geographic information] projects where the details of what to record are left to the participants.

Where resources are allocated on the basis of data generated by a service, the behaviour of this small group can have an outsized effect. Future blog posts in this series will explore what this looks like in practice.