Our new paper "Predicting Social Trends from Non-photographic Images on Twitter" accepted for IEEE 2015 Big Data Conference

Random sample of Twitter images from 2013 labeled by GoogLeNet deep learning model as web sites and texts. We call refer to these images as "image-texts." The category includes screen shots of text chats, other types of texts and other kinds of non-photographic images. We found that the frequencies of these images are correlated with well-being responses from Gallup surveys, and also median housing prices, incomes, and education levels.

Johannes Vermeer. Woman in Blue Reading a Letter. 1663-1664.

Our new paper has been accepted for Big Data and the Humanities workshop at IEEE 2015 Big Data Conference:

Predicting Social Trends from Non-photographic Images on Twitter

Mehrdad Yazdani (California Institute for Telecommunication and Information) and Lev Manovich (The Graduate Center, City University of New York)


Humanists use historical images as sources of information about social norms, behavior, fashion, and other details of particular cultures, places and periods. Dutch Golden Era paintings, works by French Impressionists, and 20th century street photography are just three examples of such images. Normally such visuals directly show objects of interests such as social scenes, city streets, or peoples dresses. But what if masses of images shared on social networks contain information about social trends even if these images do not directly represent objects of interest? This is the question we investigate in our study.

In the last few years researchers have shown that aggregated characteristics of large volumes of social media are correlated with many socio-economic characteristics and can also predict a range of social trends. The examples include flu trends, success of movies, and measures of social well-being of populations. Nearly all such studies focus on text content, such as posts on Twitter and Facebook.

In contrast, we focus on images. We investigate if features extracted from Tweeted images can predict a number of socio-economic characteristics. Our dataset is one million images shared on Twitter during one year in 20 different U.S. cities. We classify the content of these images using the state-of-the-art Convolutional Neural Network GoogLeNet and then select the largest category that we call “image-texts” - non-photographic images that are typically screen shots of websites or text-message conversations. We construct two features describing patterns in image-texts: aggregated sharing rate per year per city, and the sharing rate per hour over a 24-hour period aggregated over one year in each city.

We find that these features are correlated with self-reported social well-being responses from Gallup surveys, and also median housing prices, incomes, and education levels. These results suggest that particular types of social media images can be used to predict social characteristics not readily detectable in images.

The full paper will be available online after IEEE 2015 Big Data conference.

Data Drift exhibition co-curated by Lev Manovich opens in Riga (Latvia) 10/05/2015

screenshot from CULTUREGRAPHY (2014) by Kim Albrecht, one of the projects shown in Data Drift exhibition

DATA DRIFT exhibition

Exhibition website: http://rixc.org/en/festival/DATA%20DRIFT/

Dates: October 10 – November 22, 2015

Opening: October 9 at 19:00

Venue: kim? Contemporary Art Centre in Riga

DATA DRIFT exhibition presents a number of works by some of the most influential data designers of our time, as well as by artists who use data as their artistic medium. How can we use the data medium to represent our complex societies, going beyond "most popular," and "most liked"? How can we organize the data drifts that structure our lives to reveal meaning and beauty? (And can we still think of "beauty" given our growing concerns with privacy and commercial uses of data we share?) How to use big data to "make strange," so we can see past and present as unfamiliar and new?

If painting was the art of the classical era, and photograph that of the modern era, data visualization is the medium of our own time. Rather than looking at the outside worldwide and picturing it in interesting ways like modernist artists (Instagram filters already do this well), data designers and artists are capturing and reflecting on the new data realities of our societies.

Curated by Lev MANOVICH, Rasa SMITE and Raitis SMITS, the exhibition will feature artworks and data visualization by SPIN Unit (EU), Moritz STEFANER (DE), Frederic BRODBECK (DE), Kim ALBRECHT (DE), Boris MÜLLER (DE), Marian DÖRK (DE), Benjamin GROSSER (US), Maximilian SCHICH (DE/US), Mauro MARTINO (IT/US), Periscopic (US), Pitch Interactive (US), Smart Citizen Team (ES), Lev MANOVICH / Software Studies Initiative (US), Daniel GODDEMEYER (DE/US), Dominikus BAUR (DE), Mehrdad YAZDANI (US), Alise TIFENTALE (LV/US), Jay CHOW (US), Semiconductor (UK), Rasa SMITE, Raitis SMITS/RIXC (LV), Martins RATNIKS (LV), Kristaps EPNERS (LV).

The exhibition website includes information and images of all shown artworks.

The Exhibition Opening program includes public talk in the Renewable Futures conference by the exhibition curator Lev MANOVICH, that will take place on October 9th at 17:00 at the Stockholm School of Economics in Riga. The talk will be followed by the official opening of the exhibition at 19:00 in the kim? Contemporary Art Center.

The DATA DRIFT exhibition is featured event in this year's RIXC Art Science Festival program, taking place in Riga from October 8 to 10, 2015.

The DATA DRIFT exhibition will be open from October 10 to November 22, 2015 in Riga.

Venue: kim? Contemporary Art Centre gallery, Maskavas street 12/1, Spikeri Creative Quartier, Riga, Latvia.

Opening hours: Mon – closed, Tue 12:00–20:00 (free entrance), Wed–Sun 12:00-18:00.

How to Visualize Colors in Big Image Data: The Slice Histogram

Damon Crockett
San Diego Satellite
Fig 1. One million slices of a satellite image from downtown San Diego, arranged as a hue histogram sorted vertically by saturation.

Slice histogram zoom
Fig 2. Closeup of slice histogram above.


Perhaps primary amongst the outputs of our lab are visualizations of Big Image Data, and in particular what we call 'media visualizations' - visualizations of image data whose primary plot elements are the images themselves. Media visualizations are special cases of glyph visualizations, a class of statistical plots that present data points as glyphs - icons that carry information by way of their non-relational characteristics, things like size, shape, color, etc.

Glyph visualizations are not at all uncommon. For example, the popular plotting library ggplot in the R language can make glyphs of its scatter points (Figure 3), allowing the user to encode additional features in the visualization.

Fig 3. Glyph scatterplot. Five data dimensions are presented on two axes.

Because traditional scatter points have no non-relational characteristics - they are in fact point locations and not objects at all - they carry information only by their spatial positions. Traditional scatterplots, then, can present only as many informational dimensions as there are plotting axes. If, however, each scatter point is made a glyph with n non-relational characteristics, the dimensionality of the visualization is increased by n.

Media visualizations can therefore be understood as limit cases of glyph visualization, because they preserve, strictly speaking, all of the visual information in the dataset, and for image data, preserving all the visual information is preserving all the information. Of course, in practice, this isn't exactly true, since non-visual information (like time and place) will very often figure in the analysis, but typically, this data is used to define a subset of the data we're interested in, e.g., 'all the Instagram photos from Manhattan during this year's Fashion Week', and then our visualization of that subset now focuses exclusively on the visual information.

In any case, even assuming we care only about visual information, the mere presence of the information in the visualization does not guarantee its being readable for the viewing subject. Important for media visualizations are particular choices about sorting or otherwise organizing the images on the canvas in order to reveal patterns. Our lab has pioneered a suite of techniques for organizing images as plot elements on large digital canvases. These techniques are on display in several of our projects (most notably, Phototrails). We use sorted rectangular montages, Cartesian scatterplots, and perhaps most identifiably, polar scatterplots.


Recently, we've pioneered a new technique, one that has roots in our Selfie City project (thanks to Moritz Stefaner): the Image Histogram. The most basic form of the Image Histogram simply gives the distribution of a single feature, visual or non-visual, and uses the images themselves as plot elements.

Like the Image Montage, the Image Histogram gives every image its own place in the plot, and like the Image Plot (scatterplot), the Image Histogram uses an axis to organize data points. We have found this combination of characteristics to be very useful in presenting, e.g., temporal patterns in image data.

But because Image Histograms are media visualizations, we needn't settle for the presentation of a single feature. The images themselves are right there on display, and although the particular choice of histogram (i.e., what gets assigned to the x-axis) dictates the horizontal sorting, the vertical sorting is still up for grabs and can reveal additional patterns in the data. We can therefore think of each histogram bin as a columnar montage that can be sorted as many times as we like. In Figure 4, we can see the results of multiple vertical sorts.

SD vs. Philly
Fig 4. Image Histograms binned by hours over a week, sorted vertically by both brightness and hue.

The Image Histogram is a powerful statistical tool that admits of a great deal of variation, and is implemented using open source Python code (we use the Python Imaging Library). Big thanks to Cherie Huang for the original code. All of the Image Histograms shown here were made with a fork from that original code, using Python's pandas library and its lightning-fast operations over data tables.


My own work with the lab began around the time we started making Image Histograms, and I confronted a problem: I wanted to make media visualizations that reveal, with great clarity, the color properties of image datasets. Images wear their colors on their sleeves, of course, and so media visualizations do carry color information - all of it, in fact - but again, particular choices about sorting can matter a lot.

Unfortunately, even our very best sorting choices for Image Histograms do not yield crystal clear presentations of color. And the problem is that images typically contain lots of different colors. When you sort them by color, how do you do it? Do you take the mean hue of the whole image? The mode? Do you look at all color dimensions, or just one? How you answer these questions will depend on your goals, of course, but the questions are less important if the images have very low standard deviation of their color properties. The more uniformly-colored the images, the easier it is to sort them by color. But we can't simply enforce uniformity in our data. What we can do is plot slices of images instead of whole images. This general idea is at work in various computer vision algorithms, and it makes good sense: images typically capture scenes, and scenes have parts, so we should at the very least be looking at those parts, whatever else we do. In computer vision (and human vision), the selection of parts can be very sophisticated (in human vision, this is roughly the function of visual attention), but it is computationally expensive. We need a way to get better color visibility without sacrificing computational speed.

I developed a technique that is fast and yields excellent color visibility (Figure 5; closeup in Figure 6). Each image is sliced into some number of equal-sized parts, features are extracted from the parts, and those parts are then plotted as an Image Histogram. The entire process, carried out with 1 million slices, can take less than an hour.

LA color
Fig 5. Hue histogram sorted vertically by brightness.

LA color closeup
Fig 6. Closeup of slice histogram above.

The number of parts will depend on the kind of images we use. In my own work, I set a criterion value for average standard deviation of hue (~0.1) and then find the minimum number of parts needed to meet the criterion. This approach ensures both that the plots will make a smooth (and consistent) presentation of color and that they will carry as much object content as is possible for that particular source of image data at that particular criterion value (this as opposed to slicing all types of image data into the same number of parts). As you might imagine, the more 'zoomed out' the original image data, the clearer the object content at any particular criterion value. Satellite photos give excellent results, for example (see Figures 1 and 2 at the top of the page).

The object content is an important product of this technique. Because the plot is, like all media visualizations, composed of the images themselves, it still carries much of the information it would have carried had we used whole images - that is, unless we set the criterion value so low that our slices are uniform fields of color (even single pixels!). This would yield maximal color visibility, of course, but we'd lose all the other information. Choosing a criterion value, then, is choosing the balance between color visibility and object information. As we lower the criterion value, we look at progressively smaller parts of scenes, and gradually, they lose their structure. Just where we choose to stop this degradation will depend on our analytical goals.

It is important to note that both the Image Histogram and the slicing technique are very general use tools, and admit of great variation in their application. And of course, the tools have no particular analytical power without some intelligent selection of data. For example, we might want to compare the color signatures of different sources of image data for the same region (Figure 7).

Slice Histograms Across Image Sources
Fig 7. Slice histograms for four different sources of image data: satellite, Google Streetview, Flickr, and Twitter.

Or, we might want to visualize the colors of a particular concept, like 'autumn' (Figure 8).

New England Autumn
Fig 8. Slice histogram of Flickr images autotagged 'autumn' from New England.

Additionally, in making slice visualizations, it's not essential that the visualization take the form of a histogram. What is essential is that the plot elements have low standard deviation of whichever visual properties we're interested in, and that there be some method of sorting that groups together similar elements. That's it. We can transform these histograms into any sort of plot, or montage, or map we like. In an upcoming post, I'll talk more about my efforts to expand the space of possible forms of media visualizations can take.

"Analyzing Big Visual Data" - new slides from Software Studies Initiative (summer 2015)

I put together a new set of slides explaining theory and methods used in out lab to explore large visual cultural datasets. The slides includes visualizations from many of our projects (created internally by our lab or in collaboration with other researchers and designers.) The projects cover both social media (millions of Instagram images), contemporary popular culture (1 M manga pages) and digitized historical artworks (for example, 5000 paintings of French Impressionist artists).

You can download the slides in Keynote, Powerpoint, or PDF formats from Dropbox:

Keynote file (189 MB)

Powerpoint file (187 MB)

PDF file (note: does not include video, 75 MB).

Manovich's lectures May - July, 2015

Photo: Lev with members of Software Studies Initiative in front of HiperWall (next generation visualization system featuring multiple flat screens, Calit2. Dataset: 3200 selfies from Selfiecity project.

May 20 - July 10, 2015:

1. LDV Vision Summit, New York City, 5/20 (keynote)

2. Museum of Contemporary Art of Vojvodina, Novi Sad, 5/25

3. Belgrade Cultural Center, Belgrade, 5/26

4. Technarte 2015, Bilbao, 5/29

5. American Center, Moscow, 6/6

6. Da-Da Architecture School, Naberezhnye_Chelny (Russia), 6/7

7. Kazan (Russia), 6/8

8. School of Urban Studies, Higher School of Economics, Moscow, 6/9

9. Strelka Institute, Moscow, 6/10

10. School of Urban Studies, Higher School of Economics, 6/11 (roundtable)

11. Philosophy department, Moscow State University, 6/11

12. The European University, St. Petersburg, 6/15 (roundtable)

13. Digital Methods Summer School, University of Amsterdam, 6/29 (keynote)

14. "Big Data in the Context of Culture & Society," House of Electronic Arts Basel, 7/3 (keynote)

15. Europeana Creative Culture Jam, Vienna, 7/10 (keynote)

Manovich's lecture in St. Petersburg, June 15, 2015

в понедельник 15 июня 2015
в рамках серии мероприятий летней школы "Адаптивный город"
Высшей школы урбанистики при поддержке Института урбанистики «Среда» и ЕУСПб

пройдет лекция и дискуссия на тему "Информационный ландшафт города"

Спикер: Лев Манович (The Graduate Center, CUNY; Software Studies Initiative, softwarestudies.com)
Модератор дискуссии: Диана Вест (ЕУСПб, центр STS)
Участники дискуссии:
- Данияр Юсупов ( Институт «Среда» / СПбГАСУ)
- Александр Бухановский (НИУ ИТМО),

Мероприятие состоится в 19.00 в Белом Зале Европейского университета в Санкт-Петербурге
Гагаринская ул., 3-А.

Предварительная регистрация не требуется.

О летней школе "Адаптивный город":

Высшая школа урбанистами НИУ ВШЭ запускает цикл мероприятий, посвященных концепции Adaptable city - устойчивого развития города с учетом сегодняшних экономических условий и социальных потребностей жителей с их быстро меняющейся моделью потребления, перемещения и коммуникации.
Концепция адаптивного города — одна из доминант в стратегии развития Высшей школы урбанистики. Мы пригласили на лекции и дискуссии международных исследователей из разных областей, и наша задача — вместе изучить и осмыслить сегодняшние общественные перемены в рамках городского пространства.
Выступления будут проходить на площадках Высшей школы урбанистики, городской библиотеки, в Институте «Стрелка».

ВШУ планирует презентовать свою программу в Казани, Санкт-Петербурге и Риге,
тем самым расширить географию заинтересованных студентов и привлечь внимание специалистов из разных областей к новым подходам к развитию городов.

Партнером летней школы ВШУ в Петербурге являются:
Институт урбанистики СРЕДА (Санкт-Петербург) sredadesign.org
Центр исследований науки и технологии Европейского университета в Санкт-Петербурге (Санкт-Петербург), (http://eu.spb.ru/research-centers/sts)

Спикер: Лев Манович / Lev Manovich,
Ведущий специалист в области теории новых медиа, автор пяти книг и более 130 статей, изданных на 30 языках. Профессор в университете Калифорнии, Сан-Диего (UCSD), в Visual Arts Department, в European Graduate School в Швейцарии и в университете Де Монтфорт в Англии. В 2007 году основал лабораторию Software Studies Initiative при университете Калифорнии. Ранее, в 2001 году, издательство MIT Press выпустило книгу The Language of New Media, ставшую впоследствии одной из основополагающих работ в сфере новых медиа. Работает над применением компьютерного анализа для исследования культурных трендов. В своей лекции Лев Манович расскажет о том, как и почему анализ пользовательского контента помогает нам лучше понимать современное общество.

Модератор: Диана Вест
Доктор истории и теории архитектуры (Ph.D., Принстонский университет; тема диссертации: «"Киберсоветика": планирование, проектирование и кибернетика советского пространства, 1954-1986). Автор научных публикаций по кибернетике, истории искусства и технологий, организации пространства и применении кибернетических идей в урбанистике. Преподавала в Принстонском университете, Университете Дрекселя и других. Член ряда международных профессиональных ассоциаций: Ассоциации арт-колледжей (CAA), Ассоциации славянских, восточноевропейских и евразийских исследований (ASEEES), Общества историков архитектуры (SAH), Общества историков науки (HSS), Общества историков техники (SHOT). В настоящее время – заместитель директора по проектам Центра исследований науки и технологий (STS) Европейского университета в Санкт-Петербурге и научный сотрудник проекта Russian Computer Scientists at Home and Abroad.


Александр Бухановский, доктор технических наук, профессор кафедры информационных систем, директор НИИ Наукоемких компьютерных технологий Санкт-Петербургского государственного университета информационных технологий, механики и оптики. Специалист в области компьютерного моделирования сложных систем с использованием высокопроизводительных вычислений. Имеет значительный опыт профессиональной деятельности в области разработки распределенных предметно-ориентированных программных комплексов на основе интеллектуальных технологий. Автор более 150 научных работ.

Данияр Юсупов, архитектор-урбанист, преподаватель СПбГАСУ, один из со-основателей группы U:lab и экспертной платформы «Открытая Лаборатория Город. О.Л.Г.», эксперт Института урбанистики «Среда»; опытный специалист в области междисциплинарных урбанистических проектов; автор ряда методик диагностики и картирования состояния и потенциала развития городской среды, аудита городского пространства для размещения ключевых объектов, динамической стратегии развития территории.

Schedule and topics of Manovich's lectures in Russia, June 6-15, 2015

This is my schedule of lectures in Moscow, St. Petersburg, Naberezhnye_Chelny and Kazan. I will add details (locations and times) as I receive them.

JUNE 6: Lecture in Moscow, at American Center

JUNE 7: Lecture in Naberezhnye_Chelny, at Da-Da Architecture School (https://en.wikipedia.org/wiki/Naberezhnye_Chelny)

JUNE 8: Lecture in Kazan

topic: "Analyzing Cities using visual social media"

JUNE 9: Lecture for students at School of Urban Studies, Higher School of Economics (HSE), 7pm

topic: "How to compare one million images? Analysis and visualization of patterns in art, games, comics, cinema, web, print, and user-generated content"

JUNE 10: public lecture at Strelka (Moscow) - 8pm

topic: "How to Analyze Culture Using Social Networks" / abstract

JUNE 11: roundtable at School of Urban Studies, Higher School of Economics, 3-5pm

topic: "City, Urbanism, Social Media"

JUNE 11: Moscow State University, Faculty of Philosophy, meeting and roundtable with students and faculty, 6-8pm (location will be added)

topic: "Digital Humanities, Computational Social Science, Software Epistemology" / relevant post

JUNE 13: Presentation at electromuseum.ru, Moscow

JUNE 15: roundtable at St. Petersburg, at The European University

topic: Urban Information Landscapes

My visit is organized by U.S. Embassy in Moscow, Strelka Institure (Moscow), School of Urban Studies (HSE), and Habidatum company. I am very grateful to everybody involved for making this trip possible.

If you want to meet me during my visit to Russia, please contact me via email (my email address is here).

"Visualizing Instagram: selfies, cities, and protests" - lecture by Manovich in Belgrade, 5/26/2015

Interaction with On Broadway installation currently on view at New York Public Library (NYPL).

Visualizing Instagram: selfies, cities, and protests

Belgrade Cultural Centre, Belgrade
May 26, 2015 - 7pm


The explosive growth of social media and cultural content on the web along with the digitization of historical cultural artifacts opened up exiting new possibilities for the analysis of cultural trends, patterns and histories. Today, thousands of researchers have already published papers analyzing massive cultural datasets in many areas including social networks, urban data, online video, web site design fashion photography, popular 20th century music, 19th century literature, etc. While most of this work is done by researchers in computer science, a number of very interesting projects were also created by data designers, media artists, and humanities scholars. Here are selected examples of this work.

In my lecture I will show a number of projects created in our lab (softwarestudies.com) since 2008. They include comparison of 2.3 million Instagram images from 13 global cities (phototrails.net), interactive installation exploring Broadway street in NYC using 30 million data points and images (on-broadway.nyc), a web tool for comparison of selfie photos from 5 cities (selfiecity.net) and analysis and visualizations of 1 million manga pages and 1 million artworks from the largest network for “user-generated art” (deviantart.com). I will also talk about our current work in progress - analysis of 260 million images shared on Twitter worldwide during 2011-2014.

I will discuss how we combine methods from data science, media art, and design, and how the
use of big cultural data helps us question our existing assumptions about culture. More details about our research

Finally, I will also offer comments about the new emerging "social physics" that uses big data and computation to study the social. Our spontaneous online actions become source of behavioral and cognitive data used for commercial and surveillance purposes - improving results of search engines, customizing recommendations, determining what are the best images to be used in online ads, etc. The science used to focus on nature, with smartest people coming to work in physics, chemistry, astronomy and biology. Today, the social has become the new object of science, with hundreds of thousands of computer scientists, researchers and companies mining and mapping the data about our behaviors. In this way, the humans have become the new "nature" for the sciences. The implications of this grand shift are now beginning to unfold. Will we become the atoms in the "social physics," first dreamed by the founder of sociology Auguste Comte in the middle of 19th century? Will predictive analytics rule every aspect of our lives? What happens to the society and the individuals when they can rationalize all their communication - the way millions of people already using their Twitter and Facebook analytics to tailor their posts to their audiences?

"A City That Never Sleeps?" - new data and analysis from "On Broadway" project

ON BROADWAY from Moritz Stefaner on Vimeo.

To create our interactive installation and web application On Broadway (currently on view at New York City Public Library), we assembled lots of images and numbers:

661,809 Instagram photos shared along Broadway during six months in 2014;
Twitter posts with images for the same period in 2014;
8,527,198 Foursquare check-ins, 2009-2014;
22 million taxi pickups and drop-offs for al of 2013;
selected indicators from US Census Bureau for 2013 (latest data available).

On Broadway visualizes some of the patterns in the collected datasets, but there are many other interesting things to discover in this data.

In this fist post we discuss temporal patterns of Instagram use in some of the areas of NYC.

These are the areas crossed by Broadway street as it runs through all of Manhattan (13 miles). (In a later post we will present analysis of 10.5 million Instagram images we collected for all of NYC.) Representing the city through a single "slice" (one cross street) simplifies data analysis - instead of dealing with two dimensions of space we only have one (position along Broadway. This also allows for interesting visualizations that do not have to use all too familiar maps.

Analyzing patterns of human activity through Instagram

Why should we care about the times when and where people post on Instagram? Combined information about the locations of posts and their times can give us insights into patters of human activities. Some areas and time periods will have lots of posts, and some almost none. Of course, not every type of activity will create a strong Instagram signals, but many are (going out with friends, sightseeing, celebrating, civic events, etc.)

For example, in an earlier project (phototrails.net) we analyzed Instagram patterns during two memorial days in Tel Aviv, Israel (Holocaust Memorial Day; Israeli Fallen Soldiers and Victims of Terrorism Remembrance Day). Another project (the-everyday.net) looked at Instagram patterns during Maidan Revolution in Kyiv, Ukaine (February 2014). In both cases we found that Instagram usage gives us valuable spatial-temporal "maps" of the events, revealing their dynamics and rhythm.

Importantly, Instagram (and other media sharing networks that record location information) gives us much more than simply points in time and spaces corresponding to the users sharing images. We can also examine the images to understand what people chose to photograph and how. (Both images and their metadata can be downloaded by using Instagram API. Here are examples of recent research articles that use Instagram data). This post only discusses time and space information (when and where images were posted), in another post we will examine patterns in the content of 660,000 images we collected along Broadway.

A sample of Instagram images shared around Broadway and Maiden Lane (this area is close to Wall Street).

A sample of Instagram images shared around Broadway and West 184th Street.

1. Hours of the day

"The City That Never Sleeps" is a popular nickname for New York. But is it true? Analysis of Instagram patterns shows that this common image of New York is not quite correct (at least for the parts crossed by Broadway). Or rather, instead of full 7-8 hours of sleep, NYC only naps for couple of hours.

Numbers of posted Instagram images increases during the morning, reach their peak during the day, and decrease in the evening. The most quite period is 3am - 5am.

The volume of Instagram posts by hour.

Here is an alternative visualization of the same data that shows the differences between times of the day more dramatically. In this visualization, each hour of the day gets its own “clock”:

Data: 190K Instagram images shared along Broadway street during, weeks 10-15, 2014.

2. Hours of the day - comparison with other global cites

We can compare Broadway hourly Instagram patterns with the patterns in other global cities: Bangkok, Berlin, Moscow, Sao Paolo.

These plots use data for 20,000 Instagram images shared during exactly the same week (December 5-11, 2013). The graphs show numbers of Instagram images shared per hour averaged over one week. (We collected these images for selfiecity project using same size central area of each city.) NYC, Berlin, Moscow and Sao Paulo have similar patterns, but Bangkok and Tokyo differ: there is a peak around lunch time, and then another peak after 7pm.

3. Hours of the day - Broadway 1 vs Broadway 2:

Since Broadway crosses some of the most popular areas of NYC such as Time Square, a significant proportion of Instagram images shared along some areas along Broadway are from tourists. (In this post we don't separate tourists from locals - this will be a subject of another future post.) It is equally important to remember that Broadway crosses areas with different economic and social characteristics. Therefore, if until now we considered "Broadway" as a single data source, we will now look at temporal differences in Instagram use between its parts.

When we took all data we collected (Instagram, Twitter, FourSquare, taxi rides) and graphing it along the duration of Broadway, we found two completely different parts. It is as though one street connects two different countries. We called them Broadway 1 (from Financial District up to 110th street) and Broadway 2 (from 110th street to 220th street). The first part has the famous tourist spots, and also much more social media and taxi activity than the second part.

For example, this graph shows numbers of Instagram images along duration of Broadway (left to right):

Data: 660K Instagram images, 2/27/2014 - 8/01/2014. "Points" are centers of 100m wide rectangles spaced 30 meters apart along Broadway (713 points covering 13 miles, south to north).

The difference in Instagram volume between Broadway 1 and Broadway 2 is immediately obvious, even if we don't take into account a few spikes corresponding to popular tourists photo taking spots.

(Note the small peaks in some areas in Broadway 2 which may be reflections of gentrifications of these areas. (In a later post we will do a more detailed analysis comparing all neighborhoods crossed by Broadway).

Averaging all data we collected for Broadway 1 and Broadway 2 shows that Broadway 1 part there are 6.83 more Instagram images, 3.91 more tweets with images, 9.29 more taxi drop-offs and 7.9 more taxi pick-ups.

If we calculate household income averages for two parts using ACS 2013 census tracks data), we found that average for Broadway 1 is $119,000, while the average for Broadway 2 is $39380.

There are many reasons why we see much higher activity in Broadway 1: presence of tourists, more affluent locals, lots of people working in downtown and midtown, etc. Given how much money an average tourist spends during a visit to NYC, economically many tourists have more in common with the people living along Broadway 1 rather than Broadway 2. So we may expect that while tourists greatly magnify the difference between two parts of Broadway in social media activity and taxi use, the basic difference would exist anyway without them. (Proving or disproving this hypothesis will require further data analysis.)

Do Broadway 1 and Broadway 2 have the same temporal patterns?

In Broadway 1 (left graph) afternoon hours clearly dominate. In Broadway 2 (right graph) there is more activity in late evenings.

Note that since Broadway 1 part contains most of the Instagram images in our dataset, the left graph is quite similar to the very first graph above that shows activity for all Broadway. This is an important lesson - often when you are analyzing data representing some phenomena, the patterns you see actually correspond to only the dominant part of this phenomena. Other parts may have different patterns but they remain hidden unless we look at them separately. This is what happens in our case: only then we plotted data separately for Broadway 1 and Broadway 2, we realize that these two parts have distinct temporal patterns. (We may speculate that afternoons dominate in Broadway 1 because of tourists and also because of many people who work in downtown and midtown but go home to other boroughs or upper Manhattan in the evening).

To check that the temporal difference between two areas we are seeing is not due to particular days of the weeks, we plot the data separately for each day. In the following plots 1 to 7 labels correspond to Monday though Friday. First set of plots is for Broadway 1, and the second set is for Broadway 2.

Just as plotting data for all 13 miles of Broadway in Manhattan together hides the differences between its two parts, if we split each parts into smaller area, we may expect to find more differences. The advantage of simplification we used (Broadway 1 vs Broadway 2) is that the differences are become bigger and therefore they are easier to see. Dividing data into smaller and smaller subsets is a mixed blessing - we may gain in local specificity interpretability, but the distinctions can become smaller and smaller. Therefore its useful to both divide and gather - look at subsets of the data as well as look at data as a whole.

This is the end of our first post reporting analysis of the data we collected and organized for On Broadway project. More posts will be coming soon!

P.S. We are also working on a paper where we are comparing patterns in our datasets across all of NYC. We hope to release it on arxiv in April or May.

On Broadway - a new interactive urban data visualization from Selfiecity team

The interactive installation and web application On Broadway represents life in the 21st-century city through a compilation of images and data covering the 13 miles of Broadway that span Manhattan. The result is a new type of city view, created from the activities and media shared by hundreds of thousands of people.

On Broadway installation is currently on view in New York Public Library as part of the exhibition Public Eye: 175 Years of Sharing Photography. The exhibition will be opened until January 3, 2016. The installation uses a 46-inch multi-touch monitor.

ON BROADWAY from Moritz Stefaner on Vimeo.

video showing interaction with On Broadway

A photo of the part of On Broadway installation in New York Public Library

Media and web coverage:

Image and data used in the project include:

660,000 Instagram photos shared along Broadway during six months in 2014
Twitter posts with images for the same period
over 8 million Foursquare check-ins (2009-2014)
22 million taxi pickups and drop-offs (2013)
selected economic indicators for the parts of NYC from US Census Bureau (2013).


Daniel Goddemeyer, Moritz Stefaner, Dominikus Baur, Lev Manovich.


Members of Software Studies Initiative: Mehrdad Yazdani, Jay Chow;
Brynn Shepherd and Leah Meisterlin;
PhD students at The Graduate Center, City University of New York (CUNY): Agustin Indaco (Economics), Michelle Morales (Computational Linguistics), Emanuel Moss (Anthropology), Alise Tifentale (Art History).

Interactive application:
The app offering similar experience and functions as the installation version is available from project web site: http://on-broadway.nyc/app/

Creating On Broadway:

Today companies, government agencies and other organizations collect massive data about the cities. This data is used in many ways invisible to us. At the same time, many cities make available some of their datasets and sponsor hackathons to encourage creation of useful apps using this data. Our project is supportive of the ideas to give citizens back their data, but it takes a unique approach to this goal. Using ‘On Broadway’ interactive interface, citizens can navigate their city made from hundreds of millions of data points and social media images they have shared.

How we can best represent a "data city"? We did not want to show the data in a conventional way as graphs and numbers. We also did not want to use another convention of showing spatial data – a map. The result of our explorations is "On Broadway": a visually rich image-centric interface, where numbers play only a secondary role, and no maps are used. The project proposes a new visual metaphor for thinking about the city: a vertical stack of image and data layers. There are 13 such layers in the project, all aligned to locations along Broadway. Using our unique interface (available as the online application and as a version for a large interactive multi-touch screen, currently installed at New York Public Library), you can see all data at once, or zoom and follow Broadway block by block.

Project updates and new research using the datasets we assembled for On Broadway will be published here as blog posts and as articles in academic journals.

A screenshot from the interactive application showing all Broadway view

A screenshot from the interactive application showing a closeup view