MoMA, New York Public Library, Google, and Twitter: our 2014-2015 projects, exhibitions, publications

The last 24 months have been a very busy and productive time for our lab.

In 2014, we created six new projects focusing on analysis, visualization, and interpretation of large cultural data. Four of them were installations commissioned by New York Public Library, Google Zeitgest 2014 conference, National Taiwan Museum of Fine Arts, and a media festival in Sao Paulo. We were also commissioned to create a visualization for Wired August 2014 issue.

In 2015, we are showing our projects in 12 exhibitions in the U.S., Europe, and Asia, including Infosphere in ZKM (Center for Art and Media), Tallinn Architecture Biennale, Data Traces in Riga, and West Bund Biennial of Architecture and Contemporary Art in Shanghai. We also created a new selficity London edition for Big Bang Data exhibition at Somerset House in London.

Our projects were featured in hundreds of publications including New York Times, Wall Street Journal, BBC, The Guardian, CNN, PBS, NBC, LA Time, Washington Post, San Francisco Chronicle, The Atlantic, Discovery Channel, National Geographic, Wired, Slate, Fast Company Co.Design, Spiegel, Le Monde, Buzzfeed, Gizmodo. For the full list of media coverage, see

We are grateful to institutions that are supporting our work: The Graduate Center at City University of New York; California Institute for Telecommunication and Information and The Andrew Mellon Foundation.

We were lucky to work with the amazing team of data and media designers: Moritz Stefaner, Daniel Goddemeyer, and Dominikus Baur. Moritz is the creative director and visualization designer for both Selfiecity and On Broadway, and we are very grateful for all his work and dedication to these projects.

Awards and recognition:

Selfiecity received Gold (in website category) from Information Is Beautiful 2014 Awards.

Lev Manovich is included in The Verge 50: the list of the most interesting people building the future (2014).

Selfiecity is included in six best visualizations of 2014 lists.


Twitter Data Grant:

Out of 1300 international teams who applied, only six teams received the Twitter Data Grant. ">Twitter Data grants. As the part of the grant, Twitter gave us access to every tweeted image worldwide for 2011-2014. We will be publishing the results of our analysis of this amazing dataset in 2015-2016.

Culture Analytics Institute

Lev Manovich is a member of organizing team for Culture Analytics Institute (March 6 - June 10, UCLA). 115 leading academic and industry researchers who will be speaking at 4 Institute workshops.

Commissioned projects:

On Broadway

An interactive installation that visualizes multiple layers of data and images collected along the 13 miles of Broadway spanning Manhattan. Data includes 660,000 Instagram images, Twitter posts and images, Foursquare check-ins, Google Street View images, 22 million taxi pickups and drop-offs (2013), and economic indicators from the U.S. Census Bureau. Artists: Daniel Goddemeyer, Moritz Stefaner, Dominikus Baur, Lev Manovich. The installation is on view at the New York Public Library as part of the exhibition Public Eye: 175 Years of Sharing Photography, December 12, 2014 - January 3, 2016. Interactive web app is available at On Broadwaywebsite.

Phototrails: Animated

Animated visualizations of 2.3 million Instagram images from 13 global cities commissioned for Google Zeitgest 2014 Conference. Artists: Nadav Hochman, Lev Manovich, Jay Chow. September 14-16, Paradise Valley, Arizona. Original project (2013): Phototrails.


A site-specific project for a media facade in Sao Paulo, Brazil. Instagram selfies from Sao Paulo are animated to reveal patterns in self-representation. One animation presents the photos organized by estimated age, another by gender, and the third by the degree of smile. Artists: Jay Chow, Lev Manovich, and Moritz Stefaner. SelfieSaoPaulo was on view on the media facade, located at the building FIESP / SESI and Alameda das Flores (Avenida Paulista 1313), every night for the duration of the 2014 SP_Urban Festival in Sao Paulo, Brazil. June 9 – July 7, 2014.

Taipei Phototime

Real time visualization of Instagram image streams Taipei and New York. Artists: Lev Manovich and Jay Chow. Taipei Phototime was commissioned for Wonder of Fantasy. 2014 International Techno Art Exhibition at the National Taiwan Museum of Fine Arts, Taipei. May 17 – August 3, 2014.

selfiecity London

Analysis and visualization Instagram selfies from London. Commissioned for Big Bang Data exhibition at Somerset House in London. December 3 2015 – February 28, 2016.

Self-initiated projects:

The Exceptional and the Everyday: 144 hours in Kyiv

The first project to analyze the use of Instagram during a social upheaval. Using computational and data visualization techniques, we explored 13,208 Instagram images shared by 6,165 people in the central area of Kyiv during 2014 Ukrainian revolution (February 17 - 22, 2014). Team: Lev Manovich, Jay Chow, Alise Tifentale, and Mehrdad Yazdani, with a guest essay by Elizabeth Losh. Released 10/07/2014.


Analysis and visualization of 3,200 Instagram selfies from five global cities. The project includes an interactive web app Selfiexploratory. Team: Lev Manovich, Dominikus Baur, Jay Chow, Daniel Goddemeyer, Nadav Hochman, Moritz Stefaner, Alise Tifentale, and Mehrdad Yazdani. Released 02/19/2014.

Publications: new books 2014-2015:

Data Drift: Archiving Media and Data Art in the 21st Century. Editors: Rasa Smite, Raitis Smits, Lev Manovich. (RIXC, 2015)

The Illusions. Digital Original Edition. A BIT of The Language of New Media. Cambridge, Mass.: MIT Press, 2014.

Publications: articles, book chapters, conference papers 2014-2015:

Manovich, Lev. The Science of Culture? Social Computing, Digital Humanities, and Cultural Analytics in The Datafied Society. Social Research in the Age of Big Data, edited by Mirko Tobias Schaefer and Karin van Es. Amsterdam University Press, forthcoming in 2016.

Manovich, Lev. Exploring Urban Social Media: Selfiecity and On Broadway in Code and the City, edited by Robert Kitchin and Sung-Yueh Perng. Routledge, forthcoming February 2016.

Manovich, Lev, and Mehrdad Yazdani. Predicting Social Trends from Non-photographic Images on Twitter, Big Humanities Data workshop at IEEE Big Data 2015. Forthcoming in Proceedings of 2015 IEEE International Conference on Big Data.

Heftberger, Adelheid. «Die Verschmelzung von Wissenschaft und Filmchronik». Das Potenzial der reduktionslosen Visualisierung am Beispiel von «Das elfte Jahr» und «Der Mann mit der Kamera» von Dziga Vertov. In La visualisation des données en histoire. Visualisierung von Daten in der Geschichtswissenschaft , edited by Enrico Natale, Christiane Sibille, Nicolas Chachereau, Patrick Kammerer, and Manuel Hiestand. Zürich: Chronos-Verlag, 2015.

Manovich, Lev. Data Science and Computational Art History. International Journal for Digital Art History, Issue 1, 2015, pp. 12-34. Featured article.

Manovich, Lev and Everardo Reyes. “Info-aesthetics,” in 100 Notions for Digital Art, edited by M. Veyrat. Paris: Les Editions de l’Immatériel.

Manovich, Lev. The Language of Media Software. In The Imaginary App, edited by Svitlana Matviyenko and Paul D. Miller (Cambridge, Mass.: The MIT Press, 2014), pp. 189-204.

Manovich, Lev. Software is The Message. Journal of Visual Culture, Volume 13, Issue 1, pp. 79-81.

Reyes, Everardo. Aesthetics of temporal and spatial transformations in environments, Metaverse Creativity, Volume 4, Issue 2, pp. 151-165.

Reyes, Everardo. Explorations in Media Visualization (invited talk) in Extended Proceedings of the 25th ACM conference on Hypertext and Hypermedia Hypertext'14, September 2014. New York: ACM Press.

da Silva, Cicero Inacio and Marcio Santos. Analysing big cultural data patterns in 2.200 covers of Veja Magazine. Proceedings of the Digital Humanities Congress 2012. University of Sheffield, 2014.

Hochman, Nadav and Lev Manovich. A View From Above: Exploratory Visualizations of the Thomas Walther Collection, in Object:Photo. Modern Photographs 1909–1949: The Thomas Walther Collection. Museum of Modern Art (MOMA), website.

Manovich, Lev, Jay Chow, Alise Tifentale, Mehrdad Yazdani. The Exceptional and the Everyday: 144 Hours in Kyiv, in Proceedings of 2014 IEEE International Conference on Big Data, 77-84.

Tifentale, Alise and Lev Manovich. Selfiecity: Exploring Photography and Self-Fashioning in Social Media, in Berry, David M. and Michael Dieter, eds, Postdigital Aesthetics: Art, Computation and Design (Palgrave Macmillan: 2015), pp. 109-122.

Hochman, Nadav. The social media image, Big Data and Society, July-December 2014.

Hochman, Nadav, Lev Manovich, and Mehrdad Yazdani. On Hyper-Locality: Performances of Place in Social Media, presented at The International AAAI Conference on Weblogs and Social Media (ICWSM 2014).

Manovich, Lev. Watching the World, Aperture, No. 214 (2014), special issue on “Documentary, Expanded.”

List of 115 speakers participating in Culture Analytics Institute, 3/7/2015 - 6/10/2015

Update: we are happy to announce the list of 115 leading academic and industry researchers who will be speaking at 4 Institute workshops (see bellow for full list):

Universities: UCLA, UCSD, UCI, UCSD, Berkeley, Stanford, USC, CMU, Chicago, Duke, Michigan, UT Austin, MIT, Yale, NYU, Brown, Rutgers, KAIST, CUNY, Cambridge, etc.

Companies and research labs: Google, Facebook, Twitter, AT&T, Microsoft, NYT, NYPL, New York Hall of Science, Australian Center for Moving Image, Stamen Design, etc.


Culture Analytics Institute (7 March - 10 June, 2016), is bringing together over 170 computer science and humanities researchers.

For more information, schedule, and to apply to be in residence for the whole program or attend single workshops - see information below.

Applications will be accepted until Monday, December 7, 2015.

Institute description:

The use of computational and mathematical techniques to analyze cultural content, trends and patterns is a rapidly developing research area spanning a number of disciplines. The goal of Culture Analytics program is to present best research and to promote collaborations. To do this, we are bringing together leading scholars in the social sciences, humanities, applied mathematics, engineering, and computer science working on qualitative culture analysis.

The program is organized by Institute for Pure and Applied Mathematics (IPAM), University of California - Los Angeles (UCLA).

Lead organizers:

Tina Eliassi-Rad(Rutgers), Mauro Maggioni (Duke), Lev Manovich (The Graduate Center, CUNY), Vwani Roychowdhury(UCLA),and Tim Tangherlini (UCLA).


Workshop participants:

1. Culture analytics beyond text: image, music, video, interactivity, and performance. March 21-24, 2016.

Yong-yeol Ahn (Indiana University), Sebastian Ahnert (University of Cambridge), Luis Alvarez (Universidad de Las Palmas de Gran Canaria), Jonathan Berger (Stanford University), Johan Bollen (Indiana University), Lawrence Carin (Duke University), Damiano Cerrone (SPIN Unit – Estonian Academy of Arts), Meeyoung Cha (Korea Advanced Institute of Science and Technology (KAIST)), Edwin Chen (Twitter), Ronald Coifman (Yale University), David Crandall (Indiana University), Kate Elswit, Elena Federovskaya (Rochester Institute of Technology), Marco Iacoboni (University of California, Los Angeles (UCLA)), Yannet Interian (University of San Francisco), Tristan Jehan (Massachusetts Institute of Technology), Lindsay King (Yale University), Lev Manovich (The Graduate Center, CUNY), Daniele Quercia (Yahoo), Miriam Redi (Yahoo), Babak Saleh (Rutgers University), Brian Uzzi (J. L. Kellogg Graduate School of Management).

2. Culture analytics and user-experience design. April 11-15, 2016.

Taylor Arnold (AT&T Labs-Research), Seb Chan (Australian Centre for the Moving Image), Dana Diminescu (Telecom ParisTech and Maison des Sciences de l'Homme), Dan Edelstein (Stanford University), Sara Fabrikant (University of Zurich), Danyel Fisher (Microsoft Research), Mariah Hamel (Plotly), Francis Harvey (Universität Leipzig), Marti Hearst (University of California, Berkeley (UC Berkeley)), Lilly Irani (University of California, San Diego (UCSD)), Anab Jain (Superflux), Isaac Knowles (Indiana University), Isabel Meirelles (OCAD University), Doug Reside (New York Public Library), Eric Rodenbeck (Stamen Design), Carrie Roy (University of Wisconsin-Madison), Dan Russell (Google), Steve Uzzo (New York Hall of Science), Ben Vershbow (New York Public Library), Amanda Visconti (Purdue University).

3. Cultural patterns: multi-scale data-driven models. May 9-13, 2016.

Lada Adamic (University of Michigan), Edoardo Airoldi (Harvard University), Sinan Aral (Massachusetts Institute of Technology), Maria Binz-Scharf (City University of New York (CUNY)), Joshua Blumenstock (University of Washington), Aaron Clauset (University of Colorado Boulder), Peter Dodds (University of Vermont), Tina Eliassi-Rad (Rutgers University), Aram Galstyan (USC Information Sciences Institute), Rayid Ghani (University of Chicago), Sharique Hasan (Stanford University), Eitan Hersh (Yale University), Matt Jackson (Stanford University), Jure Leskovec (Stanford University), Steve Lohr (The New York Times), Filippo Menczer (Indiana University), Mark Newman (University of Michigan), Molly Roberts (University of California, San Diego (UCSD)), Daniel Romero (University of Michigan), Don Rubin (Harvard University), Cynthia Rudin (Massachusetts Institute of Technology), Don Saari (University of California, Irvine (UCI)), Jasjeet Sekhon (University of California, Berkeley (UC Berkeley)), Cosma Shalizi (Carnegie-Mellon University), Limor Shifman (University of Southern California (USC)), Arun Sundararajan (New York University), Timothy Tangherlini (University of California, Los Angeles (UCLA)), Johan Ugander (Stanford University), Hal Varian (Google Inc.).

4. Mathematical analysis of cultural expressive forms: text data. May 23-27, 2016.

Yong-yeol Ahn (Indiana University), Mark Algee-Hewitt (Stanford University), Ricardo Baeza-Yates (Yahoo! Research), David Blei (Columbia University), Tanya Clement (University of Texas at Austin), Brian Croxall (Brown University), Cristian Danescu-Niculescu-Mizil (Cornell University), David Garcia (ETH Zürich), Lise Getoor (University of California, Santa Cruz (UC Santa Cruz)), Ryan Heuser (Stanford University), Natalie Houston (University of Massachusetts Lowell), Matt Jockers (University of Nebraska-Lincoln), Dan Jurafsky (Stanford University), Ralph Kenna (Coventry University), Kristina Lerman (University of Southern California (USC)), Hoyt Long (University of Chicago), Winter Mason (Facebook), David Mimno (Cornell University), Suresh Naidu (Columbia University), Chris Potts (Stanford University), Lisa Rhody (George Mason University), Vwani Roychowdhury (UCLA), Noah Smith (Carnegie-Mellon University), David Smith (Northeastern University), Neel Smith (College of the Holy Cross), Richard Jean So (University of Chicago), Markus Strohmaier (Universität Koblenz-Landau), Timothy Tangherlini (University of California, Los Angeles (UCLA)), Ted Underwood (University of Illinois at Urbana-Champaign), Hannah Wallach (University of Massachusetts Amherst).


Applications for short and long stays and funding:

Applications will be accepted until Monday, December 7, 2015.

Application form:

Please consider applying and let your colleagues and students know about the program. You can come and stay for one of the workshops, or stay for the whole institute period.

Long-term stays:
For longer stays we also provide support for travel and housing, and selected PhD students will also receive a stipend.

Attending a workshop:
If you are presenting and don't have a current research project related to culture analytics, you can still apply to attend any of the workshops. We will provide partial or full housing support for all accepted participants.

We welcome applications from advanced Ph.D. students, post-docs, and junior and senior researchers at any stage of their careers. People in Computer Science, Social Sciences, and Humanities are welcome to apply, as long as you are interested in using quantitative methods for cultural analysis. Supporting the careers of women and minority mathematicians and scientists is an important component of IPAM’s mission and we welcome their applications.

Visualizing High Dimensional Image Clusters in 2D: The Growing Entourage Plot (Part II)

Damon Crockett

continued from Part I

Architecture Crop
Growing Entourage with 50 clusters of Instagram photos machine-tagged under the heading 'architecture'. Cropped.

Architecture Close
Closeup of plot immediately above.

Every image in a given cluster is ranked according to its Euclidean distance (in the original feature space) from the centroid. We can think of the centroid as the 'leader' of an 'entourage', and each image in the cluster is a member of the entourage. The closer they are to the centroid, by the aforementioned ranking, the closer they get to 'stand' near the centroid. Each cluster takes turns adding members of its 'entourage', starting with those closest to the leader. Each added member stands in the open grid space nearest its leader. Local conflicts between entourages are settled by this principle, since added members must occupy open grid squares.

 photo GE_slower_zpsjx3yaqmf.gif
50 image clusters ('entourages'), growing around their centroids, projected to 2D by PCA.

This means that the look of the plot will depend on how we generate the original grid. We might end up with an array of circular clusters in 2D, or we might end up with one large clump of images, with high-ranking members bunched up around their leaders and lower-ranking members scattered in nearby territories.

Activities (wide)
Growing Entourage with wide grid, resulting in relatively isolated circular clusters.

Growing Entourage using same data as above, but with a tighter grid. Some clusters are isolated, some have clumped with neighbors.

This is not, of course, the only way to present clusters on a 2D canvas. It is, however, probably the best way to preserve as much of the complexity of intercluster relations as is possible in 2D. Additionally, it preserves similarity relations among images in the original feature space, something we lose by pure projection methods. Finally, it preserves intracluster relations by giving the semantically closest entourage members the privileged locations nearest their leaders.

The plotting algorithm is written in Python, using the Python Imaging Library (and scikit-learn for projection), and the basic code is here.

Visualizing High Dimensional Image Clusters in 2D: The Growing Entourage Plot (Part I)

Damon Crockett

Activities Crop
A crop from a Growing Entourage plot using Instagram images machine-tagged under the heading 'activities'.

Activities Close
Closeup of above plot showing meeting of two clusters. Empty grid squares are centroid locations.


Since 2007, our lab has been visualizing large collections of cultural images. These visualizations have used either metadata variables such as date and location, or basic image features such as hue, saturation, brightness, number of lines, and texture. In particular, sorting by hue, saturation and brightness turned out to be very useful for quick exploration of large image collections. More recently, however, we've expanded the scope of our analysis to include presence and characteristics of faces (e.g., Selfie City) and now a wide range of object and scene contents. For example, we've just had a paper accepted for the IEEE 2015 Big Data Conference where we used deep learning image classification to analyze the contents of one million Twitter images.

Being able to use the latest computer vision techniques for the analysis of image content is very exciting, but it also brings new challenges. For example, how can we effectively visualize and explore the results of machine classification into many object categories? In this post, I'd like to discuss one particular method we've developed. This method visualizes high-dimensional image clusters using two dimensions. I call this method 'The Growing Entourage Plot'.

The plots in this post are drawn from our current collaboration with Miriam Redi on the clustering and visualization of large collections of Instagram images. Miriam extracted over 1000 image features that include image content (objects and scenes), photo style composition, style, texture, color and other characteristics. She then computed clusters of images using subsets of these features. Big thanks to the object detection team at Flickr for the object and scene tags (their work is described in this blog post).


In the field of information visualization, a great deal of energy is spent on the problem of how to present high-dimensional data on 2D canvases. There are at least three broad categories of solution: (1) preserve all features; (2) preserve some by selection; and (3) preserve some by redefinition. I'll discuss each of these in turn.

The first way of solution is simply to try visualizing everything. Such visualizations can be difficult both to design and to read. Media visualizations - those visualizations whose primary plot elements are visual media, like images - are officially of this type, but additional choices about sorting can make for big differences in readability. The effect of sorting on media visualization is so important, in fact, that any feature not used for sorting is essentially invisible to the viewer.

The second way of solution is the one we've used most often: select some subset of features and use them for sorting. We might, for example, sort our (infinitely?) high-dimensional image data by only brightness and hue. This is powerful and useful, but it does make invisible very complex sorts of similarity relations between images.

The third way of solution defines new features that are typically linear combinations of existing features. Principal Components Analysis (PCA) is the standard here, although there are others (e.g., t-SNE). We can present images in, say, a 1000-feature hyperspace by projecting them to two dimensions, dimensions which hadn't previously existed (although, since they are typically linear combinations of existing dimensions, they are not new data). This approach has the advantage that similarity relations between images in 1000D feature space are preserved as best as possible during the projection to 2D, meaning that our visualizations can reveal very complex sorts of similarity between images.

This third way of solution is becoming quite popular and has been in use in our lab for at least 2 years now (see, e.g., this Flickr album). In an upcoming post, I'll talk about some methods of projection visualization.


But I'd like here to talk about a different approach to dealing with dimensionality, one that is quite common in data analysis but whose use in information visualization is less common: clustering. Dimensionality reduction algorithms like PCA and t-SNE are powerful and useful but suffer major data loss in most cases. You simply can't preserve all the complexity of 1000D relations after projecting to 2D. Clustering algorithms, however (e.g., k-means), preserve a greater share of the relational data, because they find groups of data points in your original feature space (or whichever subset you choose).

Now of course, there is still the problem of visualizing these clusters. This is particularly difficult for traditional sorts of statistical visualization, because their plot elements carry information only by their spatial positions, and the human visual system can parse a maximum of 3 spatial dimensions. Thus, if we want to see these clusters in their 'natural habitat', so to speak, we're probably out of luck. Additionally, seeing clusters of points, even in a 2D or 3D space, is not particularly illuminating, since clusters, unlike classifier outputs, are not 'classes' at all and have no conventional meaning or significance from the outset. We derive the meaning or significance of a cluster of points in a high-dimensional space from the feature values of those points, and in order to see those features in a plot, we'll have again to confront the problem of presenting high-dimensional data in 2D (or 3D). For these reasons, you don't see many cluster plots (what you'll see sometimes is PCA scatterplots with cluster memberships coded by color).

Media visualizations are at an advantage here. Because media visualizations use images as plot elements, a simple presentation of each cluster is actually quite illuminating. Imagine simply presenting clusters of points, side by side. We learn nothing; these clusters are perfectly meaningless. But if those points are images, we can get a sense of what each cluster means. So, media visualizations are perfect for the presentation of clusters of images.


The question now is how exactly to arrange these clusters on a 2D canvas. As I've said before, a simple presentation of each cluster is helpful. We could, for example, make square montages of each cluster and just leaf through them. But we might want more than this - we might want also to see the relations among the clusters. And now we confront a familiar problem: we have a set of data points in n-D, where n > 2, and we want to see them in 2D. The 'points' are now clusters, but the shape of the problem is exactly the same as before. The Growing Entourage plot is my solution to this problem. It projects cluster centroids to 2D and builds clusters around them by turn-taking and semantic priority.

Food, Drinks, Meals
Growing Entourage with 50 clusters of Instagram photos machine-tagged under the heading 'food/drinks/meals'.

Food/Drinks/Meals Zoomed
Closeup of plot above. Empty grid squares are centroid locations.

We begin with high-dimensional image data and then use an algorithm like k-means to find k clusters in the original feature space. Each cluster has a centroid, given as a point location in the original feature space. We project the centroids to 2D using a dimensionality reduction algorithm, like PCA or t-SNE. We then bin these coordinates to a grid (making sure that no two centroids have the same grid location, which is typically not difficult). Now we have complex similarity relations among cluster centroids, and it remains only to build these clusters of images on the grid at the 2D centroid locations.

Growing Entourage with 50 clusters of Instagram photos machine-tagged under the heading 'nature'

Nature Zoomed
Closeup of plot above. Empty grid squares are centroid locations.

continue to Part II

new book "Data Drift: Archiving Media and Data Art in the 21st Century" (editors: Rasa Smite, Raitis Smits, Lev Manovich)

data drift copy spread

Data Drift: Archiving Media and Data Art in the 21st Century

Editors: Rasa Smite, Raitis Smits, Lev Manovich
Publisher: RIXC, LiepU MPLab (October 8, 2015)
Language: English
ISBN-10: 9934843439
ISBN-13: 978-9934843433
Paperback: 296 pages
Product Dimensions: 8.4 x 5.8 x 0.7 inches
Available on Amazon

Malraux's imaginary museum has become a reality of our digital age. Everyday, a massive amount of new visual, textual, and transactional data is added to this vast archive. How do we decide what bits of contemporary digital culture to preserve, study and curate given that its universe is constantly expanding?

The parallel questions of selection, archiving and curating are important to address with regards to all recent media art forms - from early video, computer and telematic art to later networked, web-based, software, and data art. There are several strategies already used in media art archiving practice today. However, it is also important to look for even more new approaches. For instance, data visualization opens up new possibilities for archiving, reinterpreting and exhibiting artworks, allowing us to look at the past and present from other unfamiliar perspectives.

The full color 296 page book combines a selection of papers presented in the Media Art Histories: Renew conference, selections from Save As exhibition, and new texts written for this volume. It also contains the catalog of Data Drift exhibition of data art and data design (curated by Rasa Smite, Raitis Smits, Lev Manovich). The book is published in connection with the RIXC annual 2015 festival that took place in Riga, Latvia.


Frieder Nake
Rasa Smite
Raitis Smits
Lev Manovich
Pau Alsina
Clarisse Bardiot
Valentino Catricala
Michael Century
Hanna Barbara Hoelling
Francesca Franco
Esteban Garcia
Rainer Groh
Tjarda de Haan
Franziska Hannss
Chris Hales
Canan Hastik
Vanina Hofman
Nils Jean
Rudi Knoops
Paul Landon
Esther Lapczyna
Laura Leuzzi
Magdalena Anna Nowak
Ianina Prudenko
Georgina Ruff
Arnd Steinmetz
Bernhard Thull
Elio Ugenti
Paul Vogel
Joanna Walewska
Reba Wesdorp
Artemis Willis
Erkki Huhtamo

IMG_6986 copy

IMG_6982 copy

IMG_6979 copy

IMG_6980 copy

more photos of Data Traces exhibition curated by Lev MANOVICH, Rasa SMITE and Raitis SMITS

2015-10-14 14.41.20


Exhibition website:

Dates: October 10 – November 22, 2015

Venue: kim? Contemporary Art Centre (Riga, Latvia)

Curators: Lev MANOVICH, Rasa SMITE and Raitis SMITS

The exhibition is organized by RIXC, The Center for New Media Culture, Riga (

DATA DRIFT exhibition presents a number of works by some of the most influential data designers of our time, as well as by artists who use data as their artistic medium. How can we use the data medium to represent our complex societies, going beyond "most popular," and "most liked"? How can we organize the data drifts that structure our lives to reveal meaning and beauty? (And can we still think of "beauty" given our growing concerns with privacy and commercial uses of data we share?) How to use big data to "make strange," so we can see past and present as unfamiliar and new?

Artists: SPIN Unit (EU), Moritz STEFANER (DE), Frederic BRODBECK (DE), Kim ALBRECHT (DE), Boris MÜLLER (DE), Marian DÖRK (DE), Benjamin GROSSER (US), Maximilian SCHICH (DE/US), Mauro MARTINO (IT/US), Periscopic (US), Pitch Interactive (US), Smart Citizen Team (ES), Lev MANOVICH / Software Studies Initiative (US), Daniel GODDEMEYER (DE/US), Dominikus BAUR (DE), Mehrdad YAZDANI (US), Alise TIFENTALE (LV/US), Jay CHOW (US), Semiconductor (UK), Rasa SMITE, Raitis SMITS/RIXC (LV), Martins RATNIKS (LV), Kristaps EPNERS (LV).

Exhibition photos: Lev Manovich and Kristine Madjare (

2015-10-14 14.43.51

2015-10-14 14.43.36

2015-10-14 14.43.08

2015-10-14 14.41.20


2015-10-14 14.40.47




2015-10-14 14.45.58

2015-10-14 14.40.10

2015-10-14 14.48.04

photos of DATA DRIFT exhibition curated by Lev Manovich, Rasa SMITE and Raitis SMITS



Exhibition website:

Dates: October 10 – November 22, 2015

Venue: kim? Contemporary Art Centre (Riga, Latvia)

Curators: Lev MANOVICH, Rasa SMITE and Raitis SMITS

The exhibition is organized by RIXC, The Center for New Media Culture, Riga (

DATA DRIFT exhibition presents a number of works by some of the most influential data designers of our time, as well as by artists who use data as their artistic medium. How can we use the data medium to represent our complex societies, going beyond "most popular," and "most liked"? How can we organize the data drifts that structure our lives to reveal meaning and beauty? (And can we still think of "beauty" given our growing concerns with privacy and commercial uses of data we share?) How to use big data to "make strange," so we can see past and present as unfamiliar and new?

Artists: SPIN Unit (EU), Moritz STEFANER (DE), Frederic BRODBECK (DE), Kim ALBRECHT (DE), Boris MÜLLER (DE), Marian DÖRK (DE), Benjamin GROSSER (US), Maximilian SCHICH (DE/US), Mauro MARTINO (IT/US), Periscopic (US), Pitch Interactive (US), Smart Citizen Team (ES), Lev MANOVICH / Software Studies Initiative (US), Daniel GODDEMEYER (DE/US), Dominikus BAUR (DE), Mehrdad YAZDANI (US), Alise TIFENTALE (LV/US), Jay CHOW (US), Semiconductor (UK), Rasa SMITE, Raitis SMITS/RIXC (LV), Martins RATNIKS (LV), Kristaps EPNERS (LV).

Exhibition photos: Lev Manovich and Kristine Madjare (

2015-10-14 14.59.10




2015-10-14 14.58.08

2015-10-14 14.57.09

2015-10-14 14.46.33

2015-10-14 14.40.40

2015-10-14 14.44.05

Our new paper "Predicting Social Trends from Non-photographic Images on Twitter" accepted for IEEE 2015 Big Data Conference

Random sample of Twitter images from 2013 labeled by GoogLeNet deep learning model as web sites and texts. We call refer to these images as "image-texts." The category includes screen shots of text chats, other types of texts and other kinds of non-photographic images. We found that the frequencies of these images are correlated with well-being responses from Gallup surveys, and also median housing prices, incomes, and education levels.

Johannes Vermeer. Woman in Blue Reading a Letter. 1663-1664.

Our new paper has been accepted for Big Data and the Humanities workshop at IEEE 2015 Big Data Conference:

Predicting Social Trends from Non-photographic Images on Twitter

Mehrdad Yazdani (California Institute for Telecommunication and Information) and Lev Manovich (The Graduate Center, City University of New York)


Humanists use historical images as sources of information about social norms, behavior, fashion, and other details of particular cultures, places and periods. Dutch Golden Era paintings, works by French Impressionists, and 20th century street photography are just three examples of such images. Normally such visuals directly show objects of interests such as social scenes, city streets, or peoples dresses. But what if masses of images shared on social networks contain information about social trends even if these images do not directly represent objects of interest? This is the question we investigate in our study.

In the last few years researchers have shown that aggregated characteristics of large volumes of social media are correlated with many socio-economic characteristics and can also predict a range of social trends. The examples include flu trends, success of movies, and measures of social well-being of populations. Nearly all such studies focus on text content, such as posts on Twitter and Facebook.

In contrast, we focus on images. We investigate if features extracted from Tweeted images can predict a number of socio-economic characteristics. Our dataset is one million images shared on Twitter during one year in 20 different U.S. cities. We classify the content of these images using the state-of-the-art Convolutional Neural Network GoogLeNet and then select the largest category that we call “image-texts” - non-photographic images that are typically screen shots of websites or text-message conversations. We construct two features describing patterns in image-texts: aggregated sharing rate per year per city, and the sharing rate per hour over a 24-hour period aggregated over one year in each city.

We find that these features are correlated with self-reported social well-being responses from Gallup surveys, and also median housing prices, incomes, and education levels. These results suggest that particular types of social media images can be used to predict social characteristics not readily detectable in images.

The full paper will be available online after IEEE 2015 Big Data conference.

Data Drift exhibition co-curated by Lev Manovich opens in Riga (Latvia) 10/05/2015

screenshot from CULTUREGRAPHY (2014) by Kim Albrecht, one of the projects shown in Data Drift exhibition

DATA DRIFT exhibition

Exhibition website:

Dates: October 10 – November 22, 2015

Opening: October 9 at 19:00

Venue: kim? Contemporary Art Centre in Riga

DATA DRIFT exhibition presents a number of works by some of the most influential data designers of our time, as well as by artists who use data as their artistic medium. How can we use the data medium to represent our complex societies, going beyond "most popular," and "most liked"? How can we organize the data drifts that structure our lives to reveal meaning and beauty? (And can we still think of "beauty" given our growing concerns with privacy and commercial uses of data we share?) How to use big data to "make strange," so we can see past and present as unfamiliar and new?

If painting was the art of the classical era, and photograph that of the modern era, data visualization is the medium of our own time. Rather than looking at the outside worldwide and picturing it in interesting ways like modernist artists (Instagram filters already do this well), data designers and artists are capturing and reflecting on the new data realities of our societies.

Curated by Lev MANOVICH, Rasa SMITE and Raitis SMITS, the exhibition will feature artworks and data visualization by SPIN Unit (EU), Moritz STEFANER (DE), Frederic BRODBECK (DE), Kim ALBRECHT (DE), Boris MÜLLER (DE), Marian DÖRK (DE), Benjamin GROSSER (US), Maximilian SCHICH (DE/US), Mauro MARTINO (IT/US), Periscopic (US), Pitch Interactive (US), Smart Citizen Team (ES), Lev MANOVICH / Software Studies Initiative (US), Daniel GODDEMEYER (DE/US), Dominikus BAUR (DE), Mehrdad YAZDANI (US), Alise TIFENTALE (LV/US), Jay CHOW (US), Semiconductor (UK), Rasa SMITE, Raitis SMITS/RIXC (LV), Martins RATNIKS (LV), Kristaps EPNERS (LV).

The exhibition website includes information and images of all shown artworks.

The Exhibition Opening program includes public talk in the Renewable Futures conference by the exhibition curator Lev MANOVICH, that will take place on October 9th at 17:00 at the Stockholm School of Economics in Riga. The talk will be followed by the official opening of the exhibition at 19:00 in the kim? Contemporary Art Center.

The DATA DRIFT exhibition is featured event in this year's RIXC Art Science Festival program, taking place in Riga from October 8 to 10, 2015.

The DATA DRIFT exhibition will be open from October 10 to November 22, 2015 in Riga.

Venue: kim? Contemporary Art Centre gallery, Maskavas street 12/1, Spikeri Creative Quartier, Riga, Latvia.

Opening hours: Mon – closed, Tue 12:00–20:00 (free entrance), Wed–Sun 12:00-18:00.

How to Visualize Colors in Big Image Data: The Slice Histogram

Damon Crockett
San Diego Satellite
Fig 1. One million slices of a satellite image from downtown San Diego, arranged as a hue histogram sorted vertically by saturation.

Slice histogram zoom
Fig 2. Closeup of slice histogram above.


Perhaps primary amongst the outputs of our lab are visualizations of Big Image Data, and in particular what we call 'media visualizations' - visualizations of image data whose primary plot elements are the images themselves. Media visualizations are special cases of glyph visualizations, a class of statistical plots that present data points as glyphs - icons that carry information by way of their non-relational characteristics, things like size, shape, color, etc.

Glyph visualizations are not at all uncommon. For example, the popular plotting library ggplot in the R language can make glyphs of its scatter points (Figure 3), allowing the user to encode additional features in the visualization.

Fig 3. Glyph scatterplot. Five data dimensions are presented on two axes.

Because traditional scatter points have no non-relational characteristics - they are in fact point locations and not objects at all - they carry information only by their spatial positions. Traditional scatterplots, then, can present only as many informational dimensions as there are plotting axes. If, however, each scatter point is made a glyph with n non-relational characteristics, the dimensionality of the visualization is increased by n.

Media visualizations can therefore be understood as limit cases of glyph visualization, because they preserve, strictly speaking, all of the visual information in the dataset, and for image data, preserving all the visual information is preserving all the information. Of course, in practice, this isn't exactly true, since non-visual information (like time and place) will very often figure in the analysis, but typically, this data is used to define a subset of the data we're interested in, e.g., 'all the Instagram photos from Manhattan during this year's Fashion Week', and then our visualization of that subset now focuses exclusively on the visual information.

In any case, even assuming we care only about visual information, the mere presence of the information in the visualization does not guarantee its being readable for the viewing subject. Important for media visualizations are particular choices about sorting or otherwise organizing the images on the canvas in order to reveal patterns. Our lab has pioneered a suite of techniques for organizing images as plot elements on large digital canvases. These techniques are on display in several of our projects (most notably, Phototrails). We use sorted rectangular montages, Cartesian scatterplots, and perhaps most identifiably, polar scatterplots.


Recently, we've pioneered a new technique, one that has roots in our Selfie City project (thanks to Moritz Stefaner): the Image Histogram. The most basic form of the Image Histogram simply gives the distribution of a single feature, visual or non-visual, and uses the images themselves as plot elements.

Like the Image Montage, the Image Histogram gives every image its own place in the plot, and like the Image Plot (scatterplot), the Image Histogram uses an axis to organize data points. We have found this combination of characteristics to be very useful in presenting, e.g., temporal patterns in image data.

But because Image Histograms are media visualizations, we needn't settle for the presentation of a single feature. The images themselves are right there on display, and although the particular choice of histogram (i.e., what gets assigned to the x-axis) dictates the horizontal sorting, the vertical sorting is still up for grabs and can reveal additional patterns in the data. We can therefore think of each histogram bin as a columnar montage that can be sorted as many times as we like. In Figure 4, we can see the results of multiple vertical sorts.

SD vs. Philly
Fig 4. Image Histograms binned by hours over a week, sorted vertically by both brightness and hue.

The Image Histogram is a powerful statistical tool that admits of a great deal of variation, and is implemented using open source Python code (we use the Python Imaging Library). Big thanks to Cherie Huang for the original code. All of the Image Histograms shown here were made with a fork from that original code, using Python's pandas library and its lightning-fast operations over data tables.


My own work with the lab began around the time we started making Image Histograms, and I confronted a problem: I wanted to make media visualizations that reveal, with great clarity, the color properties of image datasets. Images wear their colors on their sleeves, of course, and so media visualizations do carry color information - all of it, in fact - but again, particular choices about sorting can matter a lot.

Unfortunately, even our very best sorting choices for Image Histograms do not yield crystal clear presentations of color. And the problem is that images typically contain lots of different colors. When you sort them by color, how do you do it? Do you take the mean hue of the whole image? The mode? Do you look at all color dimensions, or just one? How you answer these questions will depend on your goals, of course, but the questions are less important if the images have very low standard deviation of their color properties. The more uniformly-colored the images, the easier it is to sort them by color. But we can't simply enforce uniformity in our data. What we can do is plot slices of images instead of whole images. This general idea is at work in various computer vision algorithms, and it makes good sense: images typically capture scenes, and scenes have parts, so we should at the very least be looking at those parts, whatever else we do. In computer vision (and human vision), the selection of parts can be very sophisticated (in human vision, this is roughly the function of visual attention), but it is computationally expensive. We need a way to get better color visibility without sacrificing computational speed.

I developed a technique that is fast and yields excellent color visibility (Figure 5; closeup in Figure 6). Each image is sliced into some number of equal-sized parts, features are extracted from the parts, and those parts are then plotted as an Image Histogram. The entire process, carried out with 1 million slices, can take less than an hour.

LA color
Fig 5. Hue histogram sorted vertically by brightness.

LA color closeup
Fig 6. Closeup of slice histogram above.

The number of parts will depend on the kind of images we use. In my own work, I set a criterion value for average standard deviation of hue (~0.1) and then find the minimum number of parts needed to meet the criterion. This approach ensures both that the plots will make a smooth (and consistent) presentation of color and that they will carry as much object content as is possible for that particular source of image data at that particular criterion value (this as opposed to slicing all types of image data into the same number of parts). As you might imagine, the more 'zoomed out' the original image data, the clearer the object content at any particular criterion value. Satellite photos give excellent results, for example (see Figures 1 and 2 at the top of the page).

The object content is an important product of this technique. Because the plot is, like all media visualizations, composed of the images themselves, it still carries much of the information it would have carried had we used whole images - that is, unless we set the criterion value so low that our slices are uniform fields of color (even single pixels!). This would yield maximal color visibility, of course, but we'd lose all the other information. Choosing a criterion value, then, is choosing the balance between color visibility and object information. As we lower the criterion value, we look at progressively smaller parts of scenes, and gradually, they lose their structure. Just where we choose to stop this degradation will depend on our analytical goals.

It is important to note that both the Image Histogram and the slicing technique are very general use tools, and admit of great variation in their application. And of course, the tools have no particular analytical power without some intelligent selection of data. For example, we might want to compare the color signatures of different sources of image data for the same region (Figure 7).

Slice Histograms Across Image Sources
Fig 7. Slice histograms for four different sources of image data: satellite, Google Streetview, Flickr, and Twitter.

Or, we might want to visualize the colors of a particular concept, like 'autumn' (Figure 8).

New England Autumn
Fig 8. Slice histogram of Flickr images autotagged 'autumn' from New England.

Additionally, in making slice visualizations, it's not essential that the visualization take the form of a histogram. What is essential is that the plot elements have low standard deviation of whichever visual properties we're interested in, and that there be some method of sorting that groups together similar elements. That's it. We can transform these histograms into any sort of plot, or montage, or map we like. In an upcoming post, I'll talk more about my efforts to expand the space of possible forms of media visualizations can take.