Cultural analytics


287 megapixel HIPerSpace supervisualization system, 2009-
The software is developed jointly by Gravity Lab and Software Studies Initiative.

Mapping Time exhibition at Calit2, Fall 2010


Cultural Analytics is the use of computational and visualization methods for the analysis of massive cultural data sets and flows. The concept of cultural analytics was developed by Lev Manovich in 2005 and the term itself was introduced in 2007. At Software Studies Initiative we are focusing on the theoretical questions and practical research in one area of cultural analytics - using digital image processing and visualization for the exploratory analysis of large image and video collections.

Our key theoretical goal is to use big cultural data and computational techniques to question our basic cultural concepts and methods. Thus, cultural analytics can also be defined as discovery of new concepts and alternative ways to visualize and understand human cultural history and the present.

Here are some of the questions which drive our work:

- How can work with "big cultural data" can help us to question traditional assumptions and concepts in humanities and social sciences
- What new theoretical concepts and models do we need to deal with the mega-scale of born-digital culture?
- How can we combine computational techniques and analysis of massive cultural data with more traditional humanities methodologies?
- What would "science of culture" driven by massive data look like, and what will be its limitations?


- How do we explore patterns in massive visual collections which may contain billions of images and video?
- How do we research interactive media processes and experiences (evolution of web design, playing a video game, etc.)?
- How to best democratize computer vision and digital image analysis techniques so they can be used by researchers and students without technical backgrounds?

To address these challenges, we are developing easy to use techniques and software for exploratory media analysis and applying them to progressively larger image and video sets. In addition to digital humanities, these techniques can be also used in cinema studies, game studies, media studies, ethnography, exhibition design, sociology, anthropology, and other fields.


Cultural Analytics projects

Our open source software tools

still visualizations (Flickr)

animated visualizations (YouTube)

selected exhibitions (Flickr)


Style Space: Analysis and Visualization of 1 million Manga pages (06/2010). [key 56.6 MB]. [ppt 17.9 MB]

Cultural Visualization Techniques (l0/2009). [key 10.3 MB]. [ppt 1.1 MB]

Learning from Software (11/2009). [key 1 MB]. [ppt 500 KB]

Cultural Analytics: vision (last update: 10/2009). [key 26.3 MB] [ppt 12.8 MB]

Cultural Analytics: case studies (last update: 06/2009). [key 32.4 MB]. [ppt 8.2 MB]


Our lab created a set of open source software tools which cover all the parts of visual media analysis for humanities:

- images and video preparation;

- feature extraction (using digital image analysis);

- interactive visual exploration;

- rendering of high res still and animated visualizations;

Our software works on regular laptops and desktop computers, as well as next-generation large-scale displays such as HIPerSpace visual supercomputer with the combined resolution of 42,000 x 8000 pixel. (The application for HIperSpace was developed jointly with Gravity Lab at Calit2). We use open-source technologies whenever possible in our development. Our core methods can be used by people without technical training - an important consideration for their adoption in humanities. We borrow ideas from information visualization, media design, interfaces of media editing applications, media art, and digital art.

Beginning in Fall 2011, we are releasing fully documented software we developed as open source to make it easier for people in digital humanities to work with large image and video data sets.


To date, we have already successfully applied our techniques to films, animations, video games, comics, magazines, books, and other print publications, artworks, photos, and other media content. Examples include all pages of Science and Popular Science magazines published in 1872-1922), hundreds of hours of videogame recordings, all paintings by van Gogh, and one million manga pages. For details, see the Cultural Analytics projects page; in addition, you can also find many more visualizations on Flickr and YouTube.


Our papers - cultural analytics theory and methods for working with big visual data sets


Our past and present collaborators include Museum of Modern Art (NYC), New York Public Library, British Library, Getty Research Institute, Austrian Film Museum, Netherlands Institute for Sound and Image and other institutions who are interested in using our methodology with their media collections. Since 2009 our work has been shown in 13 art and design exhibitions (Graphic Design Museum, San Diego Museum of Contemporary Art, gallery@calit2, and other venues.


Cultural Analytics research has been supported by The Andrew Mellon Foundation, California Institute for Telecommunications and Information Technology (Calit2), the National Science Foundation (NSF), the National Endowment for the Humanities (NEH), National Energy Research Scientific Computing Center (NERSC), University of California Humanities Research Institute (UCHRI), University of California, San Diego (UCSD), and the Singapore Ministry of Education.


1) We use use high resolution interactive visualization to explore large images and video collections.

Exploring 1,000,000 manga pages on HIPerSpace visual supercomputer at Calit2.

2) We use digital image processing to automatically measure properties of visual artifacts. This allows for better descriptions of cultural artifacts and processes than language alone.

Charles Elkins wrote in 2011: "In the humanities, there really isn't such a thing as quantitative image processing: Only in the sciences and allied fields is the quantification of images important." (Elkins, "Visual Practices across the University", Imagery in the 21st century, Oliver Grau and Thomas Viegl, eds. The MIT Press, 2011. P. 167.) Since 2008, we have been exploring which of image processing techniques are best suited for humanities. We also developed free software tools for image processing which can be used by people without technical training (two of these tools are included in ImagePlot download.)

Instead of 1D timelines which only show discrete dates and divide cultural history into periods (Elkins provides a nice summary of different periodization schemes in art history here), we can plot curves which shows patterns of change.

Instead of dividing a cultural field into a small number of discrete categories (genres, styles, mediums, countries, movements, etc.), we can visualize it as a continuos space where differences are mapped into spatial distances.

Mapping Time
4535 Time magazine covers (1923-2009) organized by publication dates.

Kingdom Hearts videogame traversal
Visualization of "Kingdom Hearts" game play: 62.5 hours of game play, in 29 sessions over 20 days.

3) Practice bottom-up cultural analysis ("exploratory data analysis"); question all existing cultural categories and labels; use computation to create alternative maps of culture.

Normally we use rather small number of cultural categories ("post-modernism," "manga", "twitter messages," "youtube videos") to describe culture. The use of these categories often implies an assumption that a category corresponds to some attributes shared by all its members. But is this true? Rather than starting with this untested assumption we want to map massive cultural data sets according to their qualities which we can measure automatically. In other words, we want to temporary forget about "metadata" and instead see what is actually there. Some maps may contain separate clusters, while others may have none. In some cases, the existing categories may correspond to distinct set of properties shared by a number of artifacts; in other cases, other categories may emerge as a better description of the field.

Manga Style Space
Visualization of one million manga pages organized by visual characteristics challenges our normal ideas about style.

Mondrian. X-axis = pca1. Y-axis = pca2.
128 paintings created by Piet Mondrian between 1904 and 1917 - organized by visual similarity.


- we are offering a new way for both museum visitors (both online and physical, if we have installation in a museum) to connect to the collections;

- visualizations which show all collection organized by different criteria complement currently dominant search paradigm;

- visitors can discover patterns across all of museum holdings or particular collections - actively making new discoveries themselves as opposed to only being recipients of expert knowledge;

- visitors can discover related images using variety of criteria;

- visitors can discover images by other artists similar to their already favorite works;

- visitors can navigate through collections in many additional ways (in contrast to a physical installation allows only one way to go through the exhibits);

- our techniques are scalable - from large super high resolution displays to desktops to tablets and mobile phones.


Cultural analytics shares many ideas and approaches with visual analytics ("the science of analytical reasoning facilitated by visual interactive interfaces") and visual data analysis, defined as follows:

"Visual data analysis blends highly advanced computational methods with sophisticated graphics engines to tap the extraordinary ability of humans to see patterns and structure in even the most complex visual presentations. Currently applied to massive, heterogeneous, and dynamic datasets, such as those generated in studies of astrophysical, fluidic, biological, and other complex processes, the techniques have become sophisticated enough to allow the interactive manipulation of variables in real time. Ultra high-resolution displays allow teams of researchers to zoom in to examine specific aspects of the renderings, or to navigate along interesting visual pathways, following their intuitions and even hunches to see where they may lead. New research is now beginning to apply these sorts of tools to the social sciences and humanities as well, and the techniques offer considerable promise in helping us understand complex social processes like learning, political and organizational change, and the diffusion of knowledge."


Multimedia Information Retrieval and other related fields in Computer Science (such as content-based image search and multimedia mining) aim to develop techniques for automatic extraction of semantic information from media, and use this information for search, data mining, interaction and other applications. The algorithms being developed in these fields are much more sophisticated and complex than the simple techniques used in our lab for cultural analytics.

Our use of much more simple techniques is motivated by a few reasobs. First, our goal is to gradually introduce art historians, researchers in media studies, communication, game studies, and other people in humanities and social sciences to image processing + media visualization approach, so they can do it themselves, without always having to apply for big grants to work with computer scientists.

Our second goal is to explore how these very simple techniques can work with variety of media collections from all fields of humanities, leading to interesting cultural interpretations of both well-known datasets (for instance, all Vincent van Gogh paintings) and user-generated media about which we may not know much before staring to explore it.

Thirdly, our goal is exploratory visualization of media collections - different from the more common goals of computer science research such as classification, search, or recommendation systems. Whie we recognize that building such systems is important, we are not interested to search for something we already think is important, or automatic classification of cultural data into a small number of already existing categories. As exploratory data analysis method in statistics which inspired us, we focus on the use of multiple visualizations of the same media dataset - which in our experience is well supported by the use of simple visual features.

For the examples of state-of-the-art Multimedia Information Retrieval, see the publications and demos of Intelligent Systems Lab Amsterdam, University of Amsterdam.


Computational analysis of massive cultural and social data sets and data flows is used widely in media and web industries. It structures contemporary media universe, cultural production and consumption, and cultural memory. Search engines, spam detection, Netflix and Amazon recommendations,, Flickr "interesting" photo rankings, movie success predictions, tools such as Google Books Ngram Viewer, Insights for Search, Search by Image, and and numerous other applications and services all rely on media analytics. This work is carried out in media industries and in academia by researchers in data mining, social computing, media computing, music information retrieval, computational linguistics, and other areas of computer science.

As humanities and social science researchers start to apply computational techniques to large data sets in their fields (see Digging Into Data 2011 competition), many questions arise. What are the new possibilities for studying culture and society made possible by "big data"? Do humanists and social scientists need to develop their own methodologies for working with big data? What is "data" in the case of interactive media? How can new computational methods can be combined with more established humanities approaches and theories? Is it possibly to study massive media sets without in-depth technical knowledge?

In addition to our practical work on digital humanities project, we at Software Studies Initiative (established 2007) are equally interested in exploring such larger questions. We believe that they can only be productively addressed using "software studies" approach, i.e. in depth understanding of software technologies behind cultural analytics.


The idea of quantitative analysis and visualization of massive cultural visual datasets in the humanities context was originally proposed by Lev Manovich in 2005. The formation of Software Studies Initiative in 2007 made possible to begin practical research in cultural analytics.

What will happen when more humanists start using interactive supervisualizations as a standard tool in their work, the way many scientists do already? If slides made possible art history, and if a movie projector and video recorder enabled film studies, what new cultural disciplines may emerge out of the use of interactive visualization and data analysis of massive cultural data sets?

Our key goals:
- Being able to better represent the complexity, diversity, variability, and uniqueness of cultural processes and artifacts.
- Create much more inclusive cultural histories and analysis - ideally taking into account all available cultural objects created in particular cultural area and time period (“art history without names.")
- Use computational analysis and visualization to characterize analog dimensions of visual cultural artifacts which natural languages can't describe adequately (such as motion or rythm).
- Develop techniques to describe the characteristics of cultural processes which until no received little or no attention (examples: gradual historical changes in visual culture over long periods; temporal changes in visual form in the career of an artist).
- Create visualization and interaction techniques and interfaces for exploration of cultural data which operate across multiple scales (think Google Earth) - from details of structure of a particular individual cultural artifact/processes (such as a single shot in a film) to massive cultural data sets/flows (such as all films made in 20th century).

Cultural analytics paradigm is related to culturomics introduced by another research team in 2010. However, while cultunomics is focused on historical text data ("digitize and analyze data about culture on extremely large scales: all books, all newspapers, all manuscripts, etc."), cultural analytics is more general - its ultimate goal is to both analyze all existing human cultural records in all media as well as contemporary born-digital cultural data and cultural and social flows.


Our work is closely aligned to the vision of digital humanities put forward by Office of Digital Humanities at the National Endowment of Humanities (the U.S. federal agency which funds humanities research). The joint NEH/NSF Digging into Data competition (2009) description opens with these questions: “How does the notion of scale affect humanities and social science research? Now that scholars have access to huge repositories of digitized data -- far more than they could read in a lifetime -- what does that mean for research?” The same questions guide our research.

The following is our statement responding to these questions (from our Digging Into Data 2011 Competition application):

"Digitization of massive amounts of cultural artifacts, the rise of born-digital and social media, and the progress in computational tools that can process massive amounts of data makes possible a fundamentally new approach to the humanities. Scholars no longer have to choose between data size and data depth. They can study exact trajectories formed by billions of cultural expressions and conversations in space and time, zooming into particular cultural texts and zooming out to see larger patterns.

New super-visualization technologies specifically designed for research purposes allow interactive exploration of massive media collections which may contain tens of thousands of hours of video and millions of still images. Researchers can quickly generate new questions and hypotheses and immediately test them. This means that researchers can quickly explore many research questions within a fraction of the time previously needed to ask just one question.

Computational analysis and visualization of large cultural data sets allows the detailed analysis of gradual historical patterns that may only manifest themselves over tens of thousands of artifacts created over number of years. Rather than describing the history of any media collection in terms of discrete parts (years, decades, periods, etc.), we can begin to see it as a set of curves, each showing how a particular dimension of form, content, and reception changes over time. In a similar fashion, we can supplement existing data classification with new categories that group together artifacts which share some common characteristics. For instance, rather than only dividing television news programs according to producers, air dates and times, or ratings, we can generate many new programs clusters based on patterns in rhetorical strategies, semantics, and visual form. In another example, we can analyze millions of examples of contemporary graphic design, web design, motion graphics, experience design and other recently developed cultural fields to create their maps which would reveal if they have any stylistic and content clusters."

Updated 1/2014