Application deadlines for workshops at Culture Analytics Institute at UCLA, March-May 2016

The application deadline for financial support to attend Workshop 1 has passed. But you can still apply for financial support to attend Workshops 2, 3, and 4.


1. Workshop I:
Culture Analytics Beyond Text: Image, Music, Video, Interactivity and Performance

MARCH 21 - 24, 2016

We are no longer accepting applications for financial support.
But you can attend by paying a small fee ($25-100 for the whole workshop). Please register here.

2. Workshop II:
Culture Analytics and User Experience Design

APRIL 11 - 15, 2016

The application form is for those requesting financial support to attend the workshop. We urge you to apply early. Applications received by Monday, February 15, 2016 will receive fullest consideration. Successful applicants will be notified as soon as funding decisions are made. If you do not need or want to apply for funding, you may simply register. IPAM will close registration if we reach capacity; for this reason, we encourage you to register early.

3. Workshop III:
Cultural Patterns: Multiscale Data-driven Models

MAY 9 - 13, 2016

The application form is for those requesting financial support to attend the workshop. We urge you to apply early. Applications received by Monday, March 14, 2016 will receive fullest consideration. Questions and supporting documents should be sent to the email below. Successful applicants will be notified as soon as funding decisions are made. If you do not need or want to apply for funding, you may simply register. IPAM will close registration if we reach capacity; for this reason, we encourage you to register early.

4. Workshop IV:
Mathematical Analysis of Cultural Expressive Forms: Text Data

MAY 23 - 27, 2016

The application form is for those requesting financial support to attend the workshop. We urge you to apply early. Applications received by Monday, March 28, 2016 will receive fullest consideration. Questions and supporting documents should be sent to the email below. Successful applicants will be notified as soon as funding decisions are made. If you do not need or want to apply for funding, you may simply register. IPAM will close registration if we reach capacity; for this reason, we encourage you to register early.

Artforum about "Data Drift" exhibition (curated by Lev Manovich, Rasa Smite, Raitis Smiths) - article by Martha Buskirk

article about Data Drift in Artforum p1
(high resolution version of page 1)

article about Data Drift in Artforum p2
(high resolution version of page 2)

Data Drift is an exhibition curated by Lev MANOVICH, Rasa SMITE and Raitis SMITS at kim? Contemporary Art Centre in Riga, October 10 – November 22, 2015.

Artforum (February 2016) just published a very nice article about the exhibition written by Marta Burskirk, Professor of Art History and Criticism at Montserrat College of Art in Beverly, Massachusetts.

Since Artforum's content is not available online, and a single print issue costs $10 (in U.S.), we offer the scan of the article above. Addition of URLs, underlining and correction for On Broadway details are ours. Click on the images to see them in high resolution on Flickr and read the text.

Introduction to Data Drift written by Manovich:

DATA DRIFT exhibition showcases works by some of the most influential data designers of our time, as well as by artists who use data as their artistic medium. How can we use the data medium to represent our complex societies, going beyond "most popular," and "most liked"? How can we organize the data drifts that structure our lives to reveal meaning and beauty? How to use big data to "make strange," so we can see past and present as unfamiliar and new?

If painting was the art of the classical era, and photograph that of the modern era, data visualization is the medium of our own time. Rather than looking at the outside worldwide and picturing it in interesting ways like modernist artists (Instagram filters already do this well), data designers and artists are capturing and reflecting on the new data realities of our societies.

Full list of projects and artists shown in Data Drift:

COMPUTERS WATCHING MOVIES (Benjamin Grosser, 2013)

ON BROADWAY (Daniel Goddemeyer , Moritz Stefaner, Dominikus Baur, Lev Manovich, 2014)

CINEMETRICS (Frederic Brodbeck, 2011)

CULTUREGRAPHY (Kim Albrecht, Boris Müller, Marian Dörk, 2014)

THE RUN (Kristaps Epners, 2015)

THE EXCEPTIONAL AND THE EVERYDAY: 144 HOURS IN KYIV (Lev Manovich, Mehrdad Yazdani, Alise Tifentale, Jay Chow, 2014)

CHARTING CULTURE (Maximilian Schich, Mauro Martino, 2014)

STADTBILDER (Moritz Stefaner, 2013)


OUT OF SIGHT, OUT OF MIND (Pitch Interactive, 2013)

BAND 9 (Semiconductor, 2015)

SMART CITIZEN (Smart Citizen Team, 2012-present)


TALK TO ME (Rasa Smite, Raitis Smits, Martins Ratniks, 2011–2015)

a new article by Alise Tifentale and Lev Manovich: "Competitive Photography and the Presentation of the Self"

Alise Tifentale and Lev Manovich, Competitive Photography and the Presentation of the Self

Publication information:
"#SELFIE–Imag(in)ing the Self in Digital Media," edited by Jens Ruchatz, Sabine Wirth, and Julia Eckel. Marburg, 2016 (forthcoming).


Many discussions of photography and other types of visual culture including user-generated content often rely on professional — amateur distinction. In this article we introduce a different pair of concepts: competitive — non-competitive. We believe that analyzing photography history and its present such as Instagram’s visual universe using these new concepts allows us to notice phenomena and patterns that traditional professional—amateur distinction hides.

The analysis of presentation of self in online digital photography is a case in point. We can now see that the selfie genre is complemented by an “anti-selfie” genre that presents the self in a different way. The two genres correspond to different understanding and uses of Instagram by non-competitive and competitive photographers.

Information about the illustration above:
Examples of Instagram “anti-selfies.” Authors, first row: @vita_century, @vita_century, @deroy_night. Second row: @lavimeer, @merciless_mart, @sex_on_water. Third row: @recklesstonight. Author’ age ranges from 14 to early 20s. (as explicitly indicated in their profile or can be estimated from photos in their galleries). The authors live in Russia or Ukraine. The photos were shared during Summer and Fall 2015.

"On Broadway" datavis installation is shown in Data in the 21st Century exhibition in Rotterdam

Data in the 21st Century

Exhibition and public program at V2_ Institute For The Unstable Media, Eendrachtsstraat 10, 3012 XL, Rotterdam

Exhibition website:

Dates: December 19, 2015 - February 14, 2016

Exhibition Hours: 11:00 TO 18:00

Opening: 19:00h, V2_, Eendrachtsstraat 10, Rotterdam

Data in the 21st Century explores the friction between the unpredictable reality that we live in and the desire to capture it in data.


Kyle McDonald
Daniel Goddemeyer, Moritz Stefaner, Dominikus Baur & Lev Manovich
Martin John Callanan
Timo Arnall
Informal Strategies
PWR Studio
Max Dovey & Manetta Berends

Our project included in the exhibition is On Broadway:

Artists: Daniel Goddemeyer, Moritz Stefaner, Dominikus Baur, Lev Manovich. Other contributors: members of Software Studies Initiative (Mehrdad Yazdani, Jay Chow), Brynn Shepherd and Leah Meisterlin, and PhD students at The Graduate Center, City University of New York (Agustin Indaco, Michelle Morales, Emanuel Moss, Alise Tifentale).

The project uses software tools developed by Software Studies Initiative with the support from The Andrew W. Mellon Foundation and National Endowment for Humanities (NEH).

"On Broadway" project wins Silver award in the competition for best visualizations of 2014

Our 2014-2015 datavis project On Broadway received Silver Award in Kantor competition for best visualizations of 2015. Our award is in Data Visualization Project category.

Our previous project Selfiecity received Gold Award in the same competition in 2014.

Other 2015 winners in Data Visualization Project category the global leaders in data visualization field: Giorgia Lupi and Stefanie Posavec (Gold Award, Dear Data) and Nicholas Feltron (Bronze Award, 2014 Annual Report).

Selected postcards from Dear Data by Giorgia Lupi and Stefanie Posavec

Screenshot from On Broadway interactive installation and website

Pages from from 2014 Annual Report by Nicholas Feltron

Lev Manovich's new lecture slides: "The Science of Culture?
 Theory and examples of computational cultural analysis using big image data"

This the latest version of my 2015 lecture about some theoretical ideas and our projects visualizing big image data (November 15):

Keynote file, 212 Mb.

PDF file, 90 MB (without video)

Here is the slides view as a single image (89 slides):

MoMA, New York Public Library, Google, and Twitter: our 2014-2015 projects, exhibitions, publications

The last 24 months have been a very busy and productive time for our lab.

In 2014, we created six new projects focusing on analysis, visualization, and interpretation of large cultural data. Four of them were installations commissioned by New York Public Library, Google Zeitgest 2014 conference, National Taiwan Museum of Fine Arts, and a media festival in Sao Paulo. We were also commissioned to create a visualization for Wired August 2014 issue.

In 2015, we are showing our projects in 12 exhibitions in the U.S., Europe, and Asia, including Infosphere in ZKM (Center for Art and Media), Tallinn Architecture Biennale, Data Traces in Riga, and West Bund Biennial of Architecture and Contemporary Art in Shanghai. We also created a new selficity London edition for Big Bang Data exhibition at Somerset House in London.

Our projects were featured in hundreds of publications including New York Times, Wall Street Journal, BBC, The Guardian, CNN, PBS, NBC, LA Time, Washington Post, San Francisco Chronicle, The Atlantic, Discovery Channel, National Geographic, Wired, Slate, Fast Company Co.Design, Spiegel, Le Monde, Buzzfeed, Gizmodo. For the full list of media coverage, see

We are grateful to institutions that are supporting our work: The Graduate Center at City University of New York; California Institute for Telecommunication and Information and The Andrew Mellon Foundation.

We were lucky to work with the amazing team of data and media designers: Moritz Stefaner, Daniel Goddemeyer, and Dominikus Baur. Moritz is the creative director and visualization designer for both Selfiecity and On Broadway, and we are very grateful for all his work and dedication to these projects.

Awards and recognition:

Selfiecity received Gold (in website category) from Information Is Beautiful 2014 Awards.

Lev Manovich is included in The Verge 50: the list of the most interesting people building the future (2014).

Selfiecity is included in six best visualizations of 2014 lists.


Twitter Data Grant:

Out of 1300 international teams who applied, only six teams received the Twitter Data Grant. ">Twitter Data grants. As the part of the grant, Twitter gave us access to every tweeted image worldwide for 2011-2014. We will be publishing the results of our analysis of this amazing dataset in 2015-2016.

Culture Analytics Institute

Lev Manovich is a member of organizing team for Culture Analytics Institute (March 6 - June 10, UCLA). 115 leading academic and industry researchers who will be speaking at 4 Institute workshops.

Commissioned projects:

On Broadway

An interactive installation that visualizes multiple layers of data and images collected along the 13 miles of Broadway spanning Manhattan. Data includes 660,000 Instagram images, Twitter posts and images, Foursquare check-ins, Google Street View images, 22 million taxi pickups and drop-offs (2013), and economic indicators from the U.S. Census Bureau. Artists: Daniel Goddemeyer, Moritz Stefaner, Dominikus Baur, Lev Manovich. The installation is on view at the New York Public Library as part of the exhibition Public Eye: 175 Years of Sharing Photography, December 12, 2014 - January 3, 2016. Interactive web app is available at On Broadwaywebsite.

Phototrails: Animated

Animated visualizations of 2.3 million Instagram images from 13 global cities commissioned for Google Zeitgest 2014 Conference. Artists: Nadav Hochman, Lev Manovich, Jay Chow. September 14-16, Paradise Valley, Arizona. Original project (2013): Phototrails.


A site-specific project for a media facade in Sao Paulo, Brazil. Instagram selfies from Sao Paulo are animated to reveal patterns in self-representation. One animation presents the photos organized by estimated age, another by gender, and the third by the degree of smile. Artists: Jay Chow, Lev Manovich, and Moritz Stefaner. SelfieSaoPaulo was on view on the media facade, located at the building FIESP / SESI and Alameda das Flores (Avenida Paulista 1313), every night for the duration of the 2014 SP_Urban Festival in Sao Paulo, Brazil. June 9 – July 7, 2014.

Taipei Phototime

Real time visualization of Instagram image streams Taipei and New York. Artists: Lev Manovich and Jay Chow. Taipei Phototime was commissioned for Wonder of Fantasy. 2014 International Techno Art Exhibition at the National Taiwan Museum of Fine Arts, Taipei. May 17 – August 3, 2014.

selfiecity London

Analysis and visualization Instagram selfies from London. Commissioned for Big Bang Data exhibition at Somerset House in London. December 3 2015 – February 28, 2016.

Self-initiated projects:

The Exceptional and the Everyday: 144 hours in Kyiv

The first project to analyze the use of Instagram during a social upheaval. Using computational and data visualization techniques, we explored 13,208 Instagram images shared by 6,165 people in the central area of Kyiv during 2014 Ukrainian revolution (February 17 - 22, 2014). Team: Lev Manovich, Jay Chow, Alise Tifentale, and Mehrdad Yazdani, with a guest essay by Elizabeth Losh. Released 10/07/2014.


Analysis and visualization of 3,200 Instagram selfies from five global cities. The project includes an interactive web app Selfiexploratory. Team: Lev Manovich, Dominikus Baur, Jay Chow, Daniel Goddemeyer, Nadav Hochman, Moritz Stefaner, Alise Tifentale, and Mehrdad Yazdani. Released 02/19/2014.

Publications: new books 2014-2015:

Data Drift: Archiving Media and Data Art in the 21st Century. Editors: Rasa Smite, Raitis Smits, Lev Manovich. (RIXC, 2015)

The Illusions. Digital Original Edition. A BIT of The Language of New Media. Cambridge, Mass.: MIT Press, 2014.

Publications: articles, book chapters, conference papers 2014-2015:

Manovich, Lev. The Science of Culture? Social Computing, Digital Humanities, and Cultural Analytics in The Datafied Society. Social Research in the Age of Big Data, edited by Mirko Tobias Schaefer and Karin van Es. Amsterdam University Press, forthcoming in 2016.

Manovich, Lev. Exploring Urban Social Media: Selfiecity and On Broadway in Code and the City, edited by Robert Kitchin and Sung-Yueh Perng. Routledge, forthcoming February 2016.

Manovich, Lev, and Mehrdad Yazdani. Predicting Social Trends from Non-photographic Images on Twitter, Big Humanities Data workshop at IEEE Big Data 2015. Forthcoming in Proceedings of 2015 IEEE International Conference on Big Data.

Heftberger, Adelheid. «Die Verschmelzung von Wissenschaft und Filmchronik». Das Potenzial der reduktionslosen Visualisierung am Beispiel von «Das elfte Jahr» und «Der Mann mit der Kamera» von Dziga Vertov. In La visualisation des données en histoire. Visualisierung von Daten in der Geschichtswissenschaft , edited by Enrico Natale, Christiane Sibille, Nicolas Chachereau, Patrick Kammerer, and Manuel Hiestand. Zürich: Chronos-Verlag, 2015.

Manovich, Lev. Data Science and Computational Art History. International Journal for Digital Art History, Issue 1, 2015, pp. 12-34. Featured article.

Manovich, Lev and Everardo Reyes. “Info-aesthetics,” in 100 Notions for Digital Art, edited by M. Veyrat. Paris: Les Editions de l’Immatériel.

Manovich, Lev. The Language of Media Software. In The Imaginary App, edited by Svitlana Matviyenko and Paul D. Miller (Cambridge, Mass.: The MIT Press, 2014), pp. 189-204.

Manovich, Lev. Software is The Message. Journal of Visual Culture, Volume 13, Issue 1, pp. 79-81.

Reyes, Everardo. Aesthetics of temporal and spatial transformations in environments, Metaverse Creativity, Volume 4, Issue 2, pp. 151-165.

Reyes, Everardo. Explorations in Media Visualization (invited talk) in Extended Proceedings of the 25th ACM conference on Hypertext and Hypermedia Hypertext'14, September 2014. New York: ACM Press.

da Silva, Cicero Inacio and Marcio Santos. Analysing big cultural data patterns in 2.200 covers of Veja Magazine. Proceedings of the Digital Humanities Congress 2012. University of Sheffield, 2014.

Hochman, Nadav and Lev Manovich. A View From Above: Exploratory Visualizations of the Thomas Walther Collection, in Object:Photo. Modern Photographs 1909–1949: The Thomas Walther Collection. Museum of Modern Art (MOMA), website.

Manovich, Lev, Jay Chow, Alise Tifentale, Mehrdad Yazdani. The Exceptional and the Everyday: 144 Hours in Kyiv, in Proceedings of 2014 IEEE International Conference on Big Data, 77-84.

Tifentale, Alise and Lev Manovich. Selfiecity: Exploring Photography and Self-Fashioning in Social Media, in Berry, David M. and Michael Dieter, eds, Postdigital Aesthetics: Art, Computation and Design (Palgrave Macmillan: 2015), pp. 109-122.

Hochman, Nadav. The social media image, Big Data and Society, July-December 2014.

Hochman, Nadav, Lev Manovich, and Mehrdad Yazdani. On Hyper-Locality: Performances of Place in Social Media, presented at The International AAAI Conference on Weblogs and Social Media (ICWSM 2014).

Manovich, Lev. Watching the World, Aperture, No. 214 (2014), special issue on “Documentary, Expanded.”

List of 115 speakers participating in Culture Analytics Institute, 3/7/2015 - 6/10/2015

Update February 6, 2015: Applications to attend separate workshops are still accepted. See below for details.


Culture Analytics Institute (7 March - 10 June, 2016), is bringing together 200 computer science and humanities researchers.

For more information, schedule, and to apply to be in residence for the whole program or attend single workshops - see information below.

Institute description:

The use of computational and mathematical techniques to analyze cultural content, trends and patterns is a rapidly developing research area spanning a number of disciplines. The goal of Culture Analytics program is to present best research and to promote collaborations. To do this, we are bringing together leading scholars in the social sciences, humanities, applied mathematics, engineering, and computer science working on qualitative culture analysis.

The program is organized by Institute for Pure and Applied Mathematics (IPAM), University of California - Los Angeles (UCLA).

Lead organizers:

Tina Eliassi-Rad(Rutgers), Mauro Maggioni (Duke), Lev Manovich (The Graduate Center, CUNY), Vwani Roychowdhury(UCLA),and Tim Tangherlini (UCLA).

115 leading academic and industry researchers who will be speaking at 4 Institute workshops (see bellow for full list):

Universities: UCLA, UCSD, UCI, UCSD, Berkeley, Stanford, USC, CMU, Chicago, Duke, Michigan, UT Austin, MIT, Yale, NYU, Brown, Rutgers, KAIST, CUNY, Cambridge, etc.

Companies and research labs: Google, Facebook, Twitter, AT&T, Microsoft, NYT, NYPL, New York Hall of Science, Australian Center for Moving Image, Stamen Design, etc.



Tutorials for Institute participants, March 8-11, 2016.

The final list of tutorials is still being defined, I will post it here when we finalize it. Here is Preliminary list:

Cultural Archives and Libraries
Network Data
Network Data: Social Sciences
Study of Culture at Scale
Computational Approaches to Literature
Computational Approaches to History
Computational Social Science
Natural Language Understanding
Computational Analysis and Visualization of Patterns in Visual Media
Computational Approaches to Music
Global Media Monitoring
Information-Theoretic Methods for Social Systems
Machine Learning for Computational Social Sciences
Analysis of social networks (Twitter, YouTube, Facebook, Instagram, etc)
Data visualization: how to translate quanitative research of cultural datasets into great visuals

Workshop participants:

1. Culture analytics beyond text: image, music, video, interactivity, and performance. March 21-24, 2016.

Yong-yeol Ahn (Indiana University), Sebastian Ahnert (University of Cambridge), Luis Alvarez (Universidad de Las Palmas de Gran Canaria), Jonathan Berger (Stanford University), Johan Bollen (Indiana University), Lawrence Carin (Duke University), Damiano Cerrone (SPIN Unit – Estonian Academy of Arts), Meeyoung Cha (Korea Advanced Institute of Science and Technology (KAIST)), Edwin Chen (Twitter), Ronald Coifman (Yale University), David Crandall (Indiana University), Kate Elswit, Elena Federovskaya (Rochester Institute of Technology), Marco Iacoboni (University of California, Los Angeles (UCLA)), Yannet Interian (University of San Francisco), Tristan Jehan (Massachusetts Institute of Technology), Lindsay King (Yale University), Lev Manovich (The Graduate Center, CUNY), Daniele Quercia (Yahoo), Miriam Redi (Yahoo), Babak Saleh (Rutgers University), Brian Uzzi (J. L. Kellogg Graduate School of Management).

2. Culture analytics and user-experience design. April 11-15, 2016.

Taylor Arnold (AT&T Labs-Research), Seb Chan (Australian Centre for the Moving Image), Dana Diminescu (Telecom ParisTech and Maison des Sciences de l'Homme), Dan Edelstein (Stanford University), Sara Fabrikant (University of Zurich), Danyel Fisher (Microsoft Research), Mariah Hamel (Plotly), Francis Harvey (Universität Leipzig), Marti Hearst (University of California, Berkeley (UC Berkeley)), Lilly Irani (University of California, San Diego (UCSD)), Anab Jain (Superflux), Isaac Knowles (Indiana University), Isabel Meirelles (OCAD University), Doug Reside (New York Public Library), Eric Rodenbeck (Stamen Design), Carrie Roy (University of Wisconsin-Madison), Dan Russell (Google), Steve Uzzo (New York Hall of Science), Ben Vershbow (New York Public Library), Amanda Visconti (Purdue University).

3. Cultural patterns: multi-scale data-driven models. May 9-13, 2016.

Lada Adamic (University of Michigan), Edoardo Airoldi (Harvard University), Sinan Aral (Massachusetts Institute of Technology), Maria Binz-Scharf (City University of New York (CUNY)), Joshua Blumenstock (University of Washington), Aaron Clauset (University of Colorado Boulder), Peter Dodds (University of Vermont), Tina Eliassi-Rad (Rutgers University), Aram Galstyan (USC Information Sciences Institute), Rayid Ghani (University of Chicago), Sharique Hasan (Stanford University), Eitan Hersh (Yale University), Matt Jackson (Stanford University), Jure Leskovec (Stanford University), Steve Lohr (The New York Times), Filippo Menczer (Indiana University), Mark Newman (University of Michigan), Molly Roberts (University of California, San Diego (UCSD)), Daniel Romero (University of Michigan), Don Rubin (Harvard University), Cynthia Rudin (Massachusetts Institute of Technology), Don Saari (University of California, Irvine (UCI)), Jasjeet Sekhon (University of California, Berkeley (UC Berkeley)), Cosma Shalizi (Carnegie-Mellon University), Limor Shifman (University of Southern California (USC)), Arun Sundararajan (New York University), Timothy Tangherlini (University of California, Los Angeles (UCLA)), Johan Ugander (Stanford University), Hal Varian (Google Inc.).

4. Mathematical analysis of cultural expressive forms: text data. May 23-27, 2016.

Yong-yeol Ahn (Indiana University), Mark Algee-Hewitt (Stanford University), Ricardo Baeza-Yates (Yahoo! Research), David Blei (Columbia University), Tanya Clement (University of Texas at Austin), Brian Croxall (Brown University), Cristian Danescu-Niculescu-Mizil (Cornell University), David Garcia (ETH Zürich), Lise Getoor (University of California, Santa Cruz (UC Santa Cruz)), Ryan Heuser (Stanford University), Natalie Houston (University of Massachusetts Lowell), Matt Jockers (University of Nebraska-Lincoln), Dan Jurafsky (Stanford University), Ralph Kenna (Coventry University), Kristina Lerman (University of Southern California (USC)), Hoyt Long (University of Chicago), Winter Mason (Facebook), David Mimno (Cornell University), Suresh Naidu (Columbia University), Chris Potts (Stanford University), Lisa Rhody (George Mason University), Vwani Roychowdhury (UCLA), Noah Smith (Carnegie-Mellon University), David Smith (Northeastern University), Neel Smith (College of the Holy Cross), Richard Jean So (University of Chicago), Markus Strohmaier (Universität Koblenz-Landau), Timothy Tangherlini (University of California, Los Angeles (UCLA)), Ted Underwood (University of Illinois at Urbana-Champaign), Hannah Wallach (University of Massachusetts Amherst).


Applications for short and long stays and funding:

Applications will be accepted until Monday, December 7, 2015.

Application form:

Please consider applying and let your colleagues and students know about the program. You can come and stay for one of the workshops, or stay for the whole institute period.

Long-term stays:
For longer stays we also provide support for travel and housing, and selected PhD students will also receive a stipend.

Attending a workshop:
If you are presenting and don't have a current research project related to culture analytics, you can still apply to attend any of the workshops. We will provide partial or full housing support for all accepted participants.

We welcome applications from advanced Ph.D. students, post-docs, and junior and senior researchers at any stage of their careers. People in Computer Science, Social Sciences, and Humanities are welcome to apply, as long as you are interested in using quantitative methods for cultural analysis. Supporting the careers of women and minority mathematicians and scientists is an important component of IPAM’s mission and we welcome their applications.

Visualizing High Dimensional Image Clusters in 2D: The Growing Entourage Plot (Part II)

Damon Crockett

continued from Part I

Architecture Crop
Growing Entourage with 50 clusters of Instagram photos machine-tagged under the heading 'architecture'. Cropped.

Architecture Close
Closeup of plot immediately above.

Every image in a given cluster is ranked according to its Euclidean distance (in the original feature space) from the centroid. We can think of the centroid as the 'leader' of an 'entourage', and each image in the cluster is a member of the entourage. The closer they are to the centroid, by the aforementioned ranking, the closer they get to 'stand' near the centroid. Each cluster takes turns adding members of its 'entourage', starting with those closest to the leader. Each added member stands in the open grid space nearest its leader. Local conflicts between entourages are settled by this principle, since added members must occupy open grid squares.

 photo GE_slower_zpsjx3yaqmf.gif
50 image clusters ('entourages'), growing around their centroids, projected to 2D by PCA.

This means that the look of the plot will depend on how we generate the original grid. We might end up with an array of circular clusters in 2D, or we might end up with one large clump of images, with high-ranking members bunched up around their leaders and lower-ranking members scattered in nearby territories.

Activities (wide)
Growing Entourage with wide grid, resulting in relatively isolated circular clusters.

Growing Entourage using same data as above, but with a tighter grid. Some clusters are isolated, some have clumped with neighbors.

This is not, of course, the only way to present clusters on a 2D canvas. It is, however, probably the best way to preserve as much of the complexity of intercluster relations as is possible in 2D. Additionally, it preserves similarity relations among images in the original feature space, something we lose by pure projection methods. Finally, it preserves intracluster relations by giving the semantically closest entourage members the privileged locations nearest their leaders.

The plotting algorithm is written in Python, using the Python Imaging Library (and scikit-learn for projection), and the basic code is here.

Visualizing High Dimensional Image Clusters in 2D: The Growing Entourage Plot (Part I)

Damon Crockett

Activities Crop
A crop from a Growing Entourage plot using Instagram images machine-tagged under the heading 'activities'.

Activities Close
Closeup of above plot showing meeting of two clusters. Empty grid squares are centroid locations.


Since 2007, our lab has been visualizing large collections of cultural images. These visualizations have used either metadata variables such as date and location, or basic image features such as hue, saturation, brightness, number of lines, and texture. In particular, sorting by hue, saturation and brightness turned out to be very useful for quick exploration of large image collections. More recently, however, we've expanded the scope of our analysis to include presence and characteristics of faces (e.g., Selfie City) and now a wide range of object and scene contents. For example, we've just had a paper accepted for the IEEE 2015 Big Data Conference where we used deep learning image classification to analyze the contents of one million Twitter images.

Being able to use the latest computer vision techniques for the analysis of image content is very exciting, but it also brings new challenges. For example, how can we effectively visualize and explore the results of machine classification into many object categories? In this post, I'd like to discuss one particular method we've developed. This method visualizes high-dimensional image clusters using two dimensions. I call this method 'The Growing Entourage Plot'.

The plots in this post are drawn from our current collaboration with Miriam Redi on the clustering and visualization of large collections of Instagram images. Miriam extracted over 1000 image features that include image content (objects and scenes), photo style composition, style, texture, color and other characteristics. She then computed clusters of images using subsets of these features. Big thanks to the object detection team at Flickr for the object and scene tags (their work is described in this blog post).


In the field of information visualization, a great deal of energy is spent on the problem of how to present high-dimensional data on 2D canvases. There are at least three broad categories of solution: (1) preserve all features; (2) preserve some by selection; and (3) preserve some by redefinition. I'll discuss each of these in turn.

The first way of solution is simply to try visualizing everything. Such visualizations can be difficult both to design and to read. Media visualizations - those visualizations whose primary plot elements are visual media, like images - are officially of this type, but additional choices about sorting can make for big differences in readability. The effect of sorting on media visualization is so important, in fact, that any feature not used for sorting is essentially invisible to the viewer.

The second way of solution is the one we've used most often: select some subset of features and use them for sorting. We might, for example, sort our (infinitely?) high-dimensional image data by only brightness and hue. This is powerful and useful, but it does make invisible very complex sorts of similarity relations between images.

The third way of solution defines new features that are typically linear combinations of existing features. Principal Components Analysis (PCA) is the standard here, although there are others (e.g., t-SNE). We can present images in, say, a 1000-feature hyperspace by projecting them to two dimensions, dimensions which hadn't previously existed (although, since they are typically linear combinations of existing dimensions, they are not new data). This approach has the advantage that similarity relations between images in 1000D feature space are preserved as best as possible during the projection to 2D, meaning that our visualizations can reveal very complex sorts of similarity between images.

This third way of solution is becoming quite popular and has been in use in our lab for at least 2 years now (see, e.g., this Flickr album). In an upcoming post, I'll talk about some methods of projection visualization.


But I'd like here to talk about a different approach to dealing with dimensionality, one that is quite common in data analysis but whose use in information visualization is less common: clustering. Dimensionality reduction algorithms like PCA and t-SNE are powerful and useful but suffer major data loss in most cases. You simply can't preserve all the complexity of 1000D relations after projecting to 2D. Clustering algorithms, however (e.g., k-means), preserve a greater share of the relational data, because they find groups of data points in your original feature space (or whichever subset you choose).

Now of course, there is still the problem of visualizing these clusters. This is particularly difficult for traditional sorts of statistical visualization, because their plot elements carry information only by their spatial positions, and the human visual system can parse a maximum of 3 spatial dimensions. Thus, if we want to see these clusters in their 'natural habitat', so to speak, we're probably out of luck. Additionally, seeing clusters of points, even in a 2D or 3D space, is not particularly illuminating, since clusters, unlike classifier outputs, are not 'classes' at all and have no conventional meaning or significance from the outset. We derive the meaning or significance of a cluster of points in a high-dimensional space from the feature values of those points, and in order to see those features in a plot, we'll have again to confront the problem of presenting high-dimensional data in 2D (or 3D). For these reasons, you don't see many cluster plots (what you'll see sometimes is PCA scatterplots with cluster memberships coded by color).

Media visualizations are at an advantage here. Because media visualizations use images as plot elements, a simple presentation of each cluster is actually quite illuminating. Imagine simply presenting clusters of points, side by side. We learn nothing; these clusters are perfectly meaningless. But if those points are images, we can get a sense of what each cluster means. So, media visualizations are perfect for the presentation of clusters of images.


The question now is how exactly to arrange these clusters on a 2D canvas. As I've said before, a simple presentation of each cluster is helpful. We could, for example, make square montages of each cluster and just leaf through them. But we might want more than this - we might want also to see the relations among the clusters. And now we confront a familiar problem: we have a set of data points in n-D, where n > 2, and we want to see them in 2D. The 'points' are now clusters, but the shape of the problem is exactly the same as before. The Growing Entourage plot is my solution to this problem. It projects cluster centroids to 2D and builds clusters around them by turn-taking and semantic priority.

Food, Drinks, Meals
Growing Entourage with 50 clusters of Instagram photos machine-tagged under the heading 'food/drinks/meals'.

Food/Drinks/Meals Zoomed
Closeup of plot above. Empty grid squares are centroid locations.

We begin with high-dimensional image data and then use an algorithm like k-means to find k clusters in the original feature space. Each cluster has a centroid, given as a point location in the original feature space. We project the centroids to 2D using a dimensionality reduction algorithm, like PCA or t-SNE. We then bin these coordinates to a grid (making sure that no two centroids have the same grid location, which is typically not difficult). Now we have complex similarity relations among cluster centroids, and it remains only to build these clusters of images on the grid at the 2D centroid locations.

Growing Entourage with 50 clusters of Instagram photos machine-tagged under the heading 'nature'

Nature Zoomed
Closeup of plot above. Empty grid squares are centroid locations.

continue to Part II