Gender, age, and ambiguity of selfies on Instagram

Gender and Age Distributions of Selfies research update by Mehrdad Yazdani, Research Scientist, Software Studies Initiative.

Who are the people behind selfies? Are they mostly young? Do women prefer taking selfies over men? Do these variations depend on geographic location? We looked at over 4,500 selfies from six cities to gain a sense of the different age groups and genders. (Analysis and visualizations of our findings for 3200 images from five cities from this dataset are available on

We did this by first downloading a random sample of 140,000 images among all Instagram photos shared by people in central areas of 6 global cities for one whole week (Dec 4-12, 2013). Our random sample of Instagram photographs include:

  • 30,000 images from Tokyo
  • 30,000 images from New York
  • 20,000 images from Bangkok
  • 20,000 images from Berlin
  • 20,000 images from Moscow
  • 20,000 images from Sao Paulo
Now we have to figure out which of these images are actual selfies. We define the selfie as a photograph that you take of yourself. Since this definition can lead to a great deal of ambiguity, we ask several people to gain a better consensus. We utilize Amazon's Mechanical Turk service to find human reviewers (who are payed!) to review each image. At least 3 reviewers review each of the 140,000 images to find all the selfies. Obviously, in some cases, the reviewers disagree if an image is a genuine selfie or not. To resolve these disagreements, we use a simple majority vote (the mode of votes in statistics jargon) to make the final call as to whether the image is a selfie or not.

plot of chunk unnamed-chunk-2

What you see above is what we call the "selfie rate," that is, the percentage of selfies that our reviewers from Mechanical Turk found from the 140,000 images that we collected. What is most striking about this figure is that, in contrast to popular belief, the selfie is not ubiquitously plastered all over Instagram. In fact, Sao Paulo has a selfie rate clocking in at just under 5%! Tokyo, on the other hand, has an even significantly lower selfie rate of a hair above 1%.

But we don't stop at just finding selfies from our set of images. If a reviewer thinks that the image is indeed a selfie, he or she also takes a best guess at the gender and age of the selfie. Again, these are reviewers who use Mechanical Turk on a regular basis and therefore asking them to complete an image-tagging problem is on-par with their expertise. The graph above shows that the gender distribution of the selfies is heavily skewed towards females. Moscow in particular has a large disproportionate amount of female selfies. In fact, it is 4 times less likely that a selfie from Moscow is male (with a 95% confidence interval between 3.3 and 5.3).

However, it is not fair to assume that gender is a binary factor that we can neatly divide into "male" or "female." Would it be possible for us to have a way of measuring the ambiguity of a selfie's gender? Answering such a question is extremely difficult, but let's take a data science approach (read: hack). We will make an assumption that if it is difficult to ascertain a selfie's gender as "male" or "female" then our reviewers from Mechanical Turk will have a harder time making a decision. Since we have multiple reviewers (at least 3 or more), then there will be more disagreements if it is truly difficult for the reviewers to determine the selfie's gender. Let's assign a confidence score between 0 and 1 to the collective agreement of the reviewers for the gender of the selfie. What follows are the averages of gender discrimination confidence for the different cities:

plot of chunk unnamed-chunk-3

We see some very interesting patterns emerging from this figure. Over the entire population, we see that the reviewers are fairly confident (over 95%) of a selfie's gender. However, consistently for every city, the average gender confidence for males is less than those of females. In the case of Berlin, this difference may very well be insignificant and due to chance, but for the other cities we see much wider gap in confidence. Especially in the case of Sao Paulo and Moscow, the reviewers are much more confident at detecting females than the other cities. One possible interpretation: What makes these cities unique is that women in these cities are unquestionably "female looking" (at least when they take their selfies and post them), so the confidence reviewers have for these female selfies is higher.

We next take a look at the age distributions of the selfies. Here they are organized by city and gender:

plot of chunk unnamed-chunk-4

The most dramatic result here is that for every city we see that men who take selfies are older than their female counterparts. Bangkok has the youngest selfie enthusiasts, while New Yorkers have the oldest. If we look on a log-scale, as the age of a selfie increases, the odds of the selfie being male increases by a factor of 6.7 (with a 95% confidence interval between 4.99 and 9.03). Overall, however, the early twenty somethings dominate selfies on Instagram. As before, we determine the age of the selfie by asking several reviewers to make their best guess. We then estimate the age of the selfie by taking the median of the guesses of the reviewers. As in the case for determining gender, this can be a very difficult task and certain selfies can be harder to answer. To ascertain the agreement level for a selfies age, we computed the standard deviation of the reviewers guesses. In this case, higher standard deviation suggests more disagreement among the reviewers. We refer to this standard deviation as the "disagreement." Below we show average disagreements for each city and gender:

plot of chunk unnamed-chunk-5

With the exception of Berlin and New York (that have the highest disagreements), female age discrimination has the least amount of disagreement. The difference between the disagreement levels of males and females in Berlin does not appear to be significant. By far, Bangkok has the least amount of disagreement for age discrimination of female selfies among all cities. It is difficult to ascertain why this is the case. We welcome any hypotheses for this finding!

In summary, our study suggests that selfies are not the dominant imagery shared on Instagram. We have also observed that the selfies are extremely popular by females and twenty somethings. We are planning more posts on about additional details and more results from research, so check back to see them.