"A City That Never Sleeps?" - new data and analysis from "On Broadway" project

ON BROADWAY from Moritz Stefaner on Vimeo.

To create our interactive installation and web application On Broadway (currently on view at New York City Public Library), we assembled lots of images and numbers:

661,809 Instagram photos shared along Broadway during six months in 2014;
Twitter posts with images for the same period in 2014;
8,527,198 Foursquare check-ins, 2009-2014;
22 million taxi pickups and drop-offs for al of 2013;
selected indicators from US Census Bureau for 2013 (latest data available).

On Broadway visualizes some of the patterns in the collected datasets, but there are many other interesting things to discover in this data.

In this fist post we discuss temporal patterns of Instagram use in some of the areas of NYC.

These are the areas crossed by Broadway street as it runs through all of Manhattan (13 miles). (In a later post we will present analysis of 10.5 million Instagram images we collected for all of NYC.) Representing the city through a single "slice" (one cross street) simplifies data analysis - instead of dealing with two dimensions of space we only have one (position along Broadway. This also allows for interesting visualizations that do not have to use all too familiar maps.

Analyzing patterns of human activity through Instagram

Why should we care about the times when and where people post on Instagram? Combined information about the locations of posts and their times can give us insights into patters of human activities. Some areas and time periods will have lots of posts, and some almost none. Of course, not every type of activity will create a strong Instagram signals, but many are (going out with friends, sightseeing, celebrating, civic events, etc.)

For example, in an earlier project (phototrails.net) we analyzed Instagram patterns during two memorial days in Tel Aviv, Israel (Holocaust Memorial Day; Israeli Fallen Soldiers and Victims of Terrorism Remembrance Day). Another project (the-everyday.net) looked at Instagram patterns during Maidan Revolution in Kyiv, Ukaine (February 2014). In both cases we found that Instagram usage gives us valuable spatial-temporal "maps" of the events, revealing their dynamics and rhythm.

Importantly, Instagram (and other media sharing networks that record location information) gives us much more than simply points in time and spaces corresponding to the users sharing images. We can also examine the images to understand what people chose to photograph and how. (Both images and their metadata can be downloaded by using Instagram API. Here are examples of recent research articles that use Instagram data). This post only discusses time and space information (when and where images were posted), in another post we will examine patterns in the content of 660,000 images we collected along Broadway.

A sample of Instagram images shared around Broadway and Maiden Lane (this area is close to Wall Street).

A sample of Instagram images shared around Broadway and West 184th Street.

1. Hours of the day

"The City That Never Sleeps" is a popular nickname for New York. But is it true? Analysis of Instagram patterns shows that this common image of New York is not quite correct (at least for the parts crossed by Broadway). Or rather, instead of full 7-8 hours of sleep, NYC only naps for couple of hours.

Numbers of posted Instagram images increases during the morning, reach their peak during the day, and decrease in the evening. The most quite period is 3am - 5am.

The volume of Instagram posts by hour.

Here is an alternative visualization of the same data that shows the differences between times of the day more dramatically. In this visualization, each hour of the day gets its own “clock”:

Data: 190K Instagram images shared along Broadway street during, weeks 10-15, 2014.

2. Hours of the day - comparison with other global cites

We can compare Broadway hourly Instagram patterns with the patterns in other global cities: Bangkok, Berlin, Moscow, Sao Paolo.

These plots use data for 20,000 Instagram images shared during exactly the same week (December 5-11, 2013). The graphs show numbers of Instagram images shared per hour averaged over one week. (We collected these images for selfiecity project using same size central area of each city.) NYC, Berlin, Moscow and Sao Paulo have similar patterns, but Bangkok and Tokyo differ: there is a peak around lunch time, and then another peak after 7pm.

3. Hours of the day - Broadway 1 vs Broadway 2:

Since Broadway crosses some of the most popular areas of NYC such as Time Square, a significant proportion of Instagram images shared along some areas along Broadway are from tourists. (In this post we don't separate tourists from locals - this will be a subject of another future post.) It is equally important to remember that Broadway crosses areas with different economic and social characteristics. Therefore, if until now we considered "Broadway" as a single data source, we will now look at temporal differences in Instagram use between its parts.

When we took all data we collected (Instagram, Twitter, FourSquare, taxi rides) and graphing it along the duration of Broadway, we found two completely different parts. It is as though one street connects two different countries. We called them Broadway 1 (from Financial District up to 110th street) and Broadway 2 (from 110th street to 220th street). The first part has the famous tourist spots, and also much more social media and taxi activity than the second part.

For example, this graph shows numbers of Instagram images along duration of Broadway (left to right):

Data: 660K Instagram images, 2/27/2014 - 8/01/2014. "Points" are centers of 100m wide rectangles spaced 30 meters apart along Broadway (713 points covering 13 miles, south to north).

The difference in Instagram volume between Broadway 1 and Broadway 2 is immediately obvious, even if we don't take into account a few spikes corresponding to popular tourists photo taking spots.

(Note the small peaks in some areas in Broadway 2 which may be reflections of gentrifications of these areas. (In a later post we will do a more detailed analysis comparing all neighborhoods crossed by Broadway).

Averaging all data we collected for Broadway 1 and Broadway 2 shows that Broadway 1 part there are 6.83 more Instagram images, 3.91 more tweets with images, 9.29 more taxi drop-offs and 7.9 more taxi pick-ups.

If we calculate household income averages for two parts using ACS 2013 census tracks data), we found that average for Broadway 1 is $119,000, while the average for Broadway 2 is $39380.

There are many reasons why we see much higher activity in Broadway 1: presence of tourists, more affluent locals, lots of people working in downtown and midtown, etc. Given how much money an average tourist spends during a visit to NYC, economically many tourists have more in common with the people living along Broadway 1 rather than Broadway 2. So we may expect that while tourists greatly magnify the difference between two parts of Broadway in social media activity and taxi use, the basic difference would exist anyway without them. (Proving or disproving this hypothesis will require further data analysis.)

Do Broadway 1 and Broadway 2 have the same temporal patterns?

In Broadway 1 (left graph) afternoon hours clearly dominate. In Broadway 2 (right graph) there is more activity in late evenings.

Note that since Broadway 1 part contains most of the Instagram images in our dataset, the left graph is quite similar to the very first graph above that shows activity for all Broadway. This is an important lesson - often when you are analyzing data representing some phenomena, the patterns you see actually correspond to only the dominant part of this phenomena. Other parts may have different patterns but they remain hidden unless we look at them separately. This is what happens in our case: only then we plotted data separately for Broadway 1 and Broadway 2, we realize that these two parts have distinct temporal patterns. (We may speculate that afternoons dominate in Broadway 1 because of tourists and also because of many people who work in downtown and midtown but go home to other boroughs or upper Manhattan in the evening).

To check that the temporal difference between two areas we are seeing is not due to particular days of the weeks, we plot the data separately for each day. In the following plots 1 to 7 labels correspond to Monday though Friday. First set of plots is for Broadway 1, and the second set is for Broadway 2.

Just as plotting data for all 13 miles of Broadway in Manhattan together hides the differences between its two parts, if we split each parts into smaller area, we may expect to find more differences. The advantage of simplification we used (Broadway 1 vs Broadway 2) is that the differences are become bigger and therefore they are easier to see. Dividing data into smaller and smaller subsets is a mixed blessing - we may gain in local specificity interpretability, but the distinctions can become smaller and smaller. Therefore its useful to both divide and gather - look at subsets of the data as well as look at data as a whole.

This is the end of our first post reporting analysis of the data we collected and organized for On Broadway project. More posts will be coming soon!

P.S. We are also working on a paper where we are comparing patterns in our datasets across all of NYC. We hope to release it on arxiv in April or May.