Innovative interfaces to museum collections

Experimental interfaces to museum collections (from my 2011 digital humanities++ class syllabus):

image grid: SFMOMA Artscope

heat map: Natural Science Museum of Barcelona collection heat map

2D timeline: Powerhouse museum australian dress register

3D navigation: Google art project

interactive surface: Microsoft Surface Museum Collection Portal

remix: lACMA collection remix (no longer online)

Mondrian vs Rothko: footprints and evolution in style space

Mondrian 1905-1917. Rothko 1938-1953. X=brightness mean. Y=saturation mean.
Data: 128 paintings by Piet Mondrian (1905-1917); 123 paintings by Mark Rothko (1938-1953).
Mapping: The two image plots are placed side by side. In each plot: X-axis: brightness mean; Y-axis: saturation mean.


visualizations and text: Lev Manovich, 2009-2011.

original concept: Chanda Carey. Mondrian images scanning: Xiaoda Wang.

Keywords: Mondrian, Rothko, manga, comics, visualization, cultural analytics,, software studies, digital humanities, Calit2, UCSD, HIPerSpace, cultural analytics, Lev Manovich, big data

All visualizations are created with free ImagePlot software developed by Software Studies Initiative. Mondrian's image set and data used in the visualizations are distributed with ImagePlot software. Visualizations use measurements of images visual characteristics obtained the tool included with ImagePlot software.

Download ImagePlot 1.1


The two visualizations above comprare simmilar number of paintings by Piet Mondrian and Mark Rothko. They demonstrate how image plots (scatter plots with images superimposed over points) can be used to compare multiple cultural image sets. In this case, the goal is to compare a similar number of paintings by Piet Mondrian and Mark Rothko produced over comparable time periods. (Note that our image sets used in these visualizations do not contain all paintings created by the two artists during selected periods; once we have complete image sets, we will update visualizations and our interpetations.)

We have selected particular periods in the career of each artist which are structurally similar. In the beginning of a period each artist was imitating his predecessors and contemporaries. By the end each artist developed an original visual language - a unique cultural "brand". In between, each gradually moved moved from figurative representation to pure abstraction.

Mondrian's paintings illustrating the changes in his art during selected period. Left: 1905; middle: 1909; right: 1917.

The left visualization above shows 128 paintings by Mondrian; the right shows 123 paintings by Rothko. The paintings are organized according to their brightness mean (X-axis) and saturation mean (Y-axis).

Projecting sets of paintings of these two artists into the same coordinate space reveals their comparative "footprints" - the parts of the space of visual possibilities they explored. We can see the relative distributions of their works - the more dense and the more sparse areas, the presence or absence of clusters, the outliers, etc.

The visualizations also show how Mark Rothko - the abstract artist of the generation which followed Mondrian - was exploring the parts of brightness/hue space which Mondrian did not reach (highly saturated and bright paintings in the upper right corner, and desaturated dark paintings in the left part).

Another interesting pattern revealed by the visualization is that all paintings of one artists are sufficiently different from each other – no two occupy the same point in brightness / saturation space. This makes sense given the ideology of modern art on unique original works – if we are to map works from earlier centuries, when it was common for artists to make copies of successful works which were considered to be equally valuable, we may expect to see a different pattern. However what could not be predicted is that the distances between any two paintings which are next to each are similar to each other – i.e., while each image occupies its own unique position, its not very far from its neighbours.

To see the evolution of Mondrian and Rothko on brightness/saturation dimensions during the comparable time periods, we can visualize the paintings as color circles. The colors indicate the position of each paintings within the time period, running from blue to red. To make the patterns even easier to see, we also vary the size of circles from - from smallest to largest. (Note: to create this visualization, we modified ImagePlot code to vary the size of points.)

X-axis = brightness mean.
Y-axis = saturation mean.

Mondrian 1905-1917. Rothko 1938-1953. blue to red

This visualization reveals another interesting pattern. Rothko starts his explorations in late 1930-1940s in the same same part of brightness/saturation space where Mondrian arrives by 1917 - high brightness/low saturation area (the right bottom corner of the plot). But as he develops, he is able to move beyond the areas already “marked” by his European predecessors such as Mondrian.

Another way to represent development of an artist's visual language over time in 2D dimensions is by adding animation. The following animated visualizations plots Mondrian's paintings (left) next to Rothko (paintings). Each of the plots uses the same measurements for X-axis and Y-axis as the visualization above (X-axis = brightness mean; Y-axis = saturation mean). The paintings are plotted according to their dates. We selected the same number of paintings for two artists, so the two animations will finish at the same time.

Here is another example of the use of animation to show patterns in time. Mondrian's paintings are plotted sequentially according to the year when they were painted; the year is shown in the upper left corner. (Note: Since we don't know the exact dates of the paintings, a particular order used to render paintings in each year is not important.) Visualization uses a standard statistical technique called PCA (Principal Component Analysis) to project 60 different visual features calculated over each of Mondrian's paintings into new dimensions. (PCA_1 is mapped into X, and PCA_2 is mapped into Y). As previous visualizations, this animation maps visual similarity into distance. However, now distance codes similarity not along a single visual dimension such as brightness or saturation, but along dozens of dimensions combined together.

To let you better see how PCA organizes images by visual similarity, here is a high-resolutin visualization showing all 128 Mondrian images:


Digital image processing allows us to measure images on hundreds of other visual dimensions: colors, textures, lines, shapes, etc. In computer science, such measurements are often called "image features."

We can map images into a space defined by any combination of these features. For example, the following visualization of Mondrian's paintings maps uses average saturation to X-axis, and average hue to Y-axis. (X coordinate of an image = a median of all pixels' saturation value; Y coordinate of an image is a medan of hue values.)

Although an average of all pixel's hues may seem like a strange concept, this feature measurement turns out to be quite meaningful: it reveals that almost all of 128 Mondrian paintings created between 1905 and 1917 fall into groups: whose dominated by yellow and orange (bottom) and whose dominated by blue and violet (top).


We can also create a similar visualization for Mark Rothko's paintings and compare the two visualizations. Mondrian's paintings are on the left; Rothko's paintings are on the right. Surpisingly, Rothko's works created during a period of a similar duration (Mondrian: 1905-1917; Rothko: 1939-1952) turn out to form rather similar clusters in saturation/hue space.



Ryan Andrews wrote a very thoughtful and detailed response to our Mondrian/Rothko visualizations.

Additional visualizations of paintings by Mondrian and Rothko

Exploring Rothko on 287 megapixel HiperSpace visualization supercomputer | images | video

How does the notion of scale affect humanities?

Gallery 2

This month we finished our application for Digging Into Data 2011 competition. One of the questions we were suppose to address was "How does the notion of scale affect humanities?"

Here is what I wrote:

"Digitization of massive amounts of cultural artifacts, the rise of born-digital and social media, and the progress in computational tools that can process massive amounts of data makes possible a fundamentally new approach to the humanities. Scholars no longer have to choose between data size and data depth. They can study exact trajectories formed by billions of cultural expressions and conversations in space and time, zooming into particular cultural texts and zooming out to see larger patterns.

New super-visualization technologies specifically designed for research purposes allow interactive exploration of massive media collections which may contain tens of thousands of hours of video and millions of still images. Researchers can quickly generate new questions and hypotheses and immediately test them. This means that researchers can quickly explore many research questions within a fraction of the time previously needed to ask just one question.

Computational analysis and visualization of large cultural data sets allows the detailed analysis of gradual historical patterns that may only manifest themselves over tens of thousands of artifacts created over number of years. Rather than describing the history of any media collection in terms of discrete parts (years, decades, periods, etc.), we can begin to see it as a set of curves, each showing how a particular dimension of form, content, and reception changes over time. In a similar fashion, we can supplement existing data classification with new categories that group together artifacts which share some common characteristics. For instance, rather than only dividing television news programs according to producers, air dates and times, or ratings, we can generate many new programs clusters based on patterns in rhetorical strategies, semantics, and visual form. In another example, we can analyze millions of examples of contemporary graphic design, web design, motion graphics, experience design and other recently developed cultural fields to create their maps which would reveal if they have any stylistic and content clusters."

digital humanities, cultural analytics, software studies

Exploring one million manga pages on the 287 megapixel HIPerSpace

I wrote this new description of what we do in our lab and how this relates to "cultural analytics," "software studies" and "digital humanities." Comments are welcome - email me at manovich at ucsd dot edu. Also, if you come across some interesting and easy to grab cultural data sets which we should analyze, let me know.

Thanks all,

Cultural Analytics is the term we coined to describe computational analysis of massive cultural and social data sets and data flows. Over last 15-10 years, cultural analytics came to structure contemporary media universe, cultural production and consumption, and cultural memory. Search engines, spam detection, Netflix and Amazon recommendations,, Flickr "interesting" photo rankings, movie success predictions, tools such as Google Books Ngram Viewer, Insights for Search, Search by Image, and and numerous other applications and services all rely on cultural analytics. This work is carried out in media industries and in academia by researchers in data mining, social computing, media computing, music information retrieval, computational linguistics, and other areas of computer science.

As humanities and social science researchers start to apply computational techniques to large data sets in their fields (see Digging Into Data 2011 competition), many questions arise. What are the new possibilities for studying culture and society made possible by "big data"? Do humanists and social scientists need to develop their own methodologies for working with big data? What is "data" in the case of interactive media? How can new computational methods can be combined with more established humanities approaches and theories? Is it possibly to study massive media sets without in-depth technical knowledge?

Our research at Software Studies Initiative (established 2007) focuses on exploring such questions. We believe that they can only be productively addressed using "software studies" approach, i.e. in depth understanding of software technologies behind cultural analytics.

Our second focus is the development of methods and intuitive visual techniques for exploration and analysis of large visual data sets including images, video, and digital media. (In fact, we are the first digital humanities lab to specifically target visual media analytics.)

We created a set of open source software tools which cover all the parts of such work: data preparation, feature extraction (using digital image analysis), interactive visual exploration, and rendering of high res still and animated visualizations. Our software works on regular laptops and desktop computers, next-generation large-scale displays such as HIPerSpace visual supercomputer with the combined resolution of 42,000 x 8000 pixel. We use open-source technologies whenever possible in our development.

To date, we have already successfully applied our techniques with films, animations, video games, comics, print publications, artworks, photos, and other media content. Examples include all pages of Science and Popular Science magazines published in 1872-1922), hundreds of hours of videogame recordings, all paintings by van Gogh, and one million manga pages. For details, see the Cultural Analytics projects page; in addition, you can also find many more visualizations on Flickr and YouTube.

Our past and present collaborators include Getty Museum, Wikimedia, Austrian Film Museum, Magnum Photos, Netherlands Institute for Sound and Image and other institutions who are interested in using our methodology with their media collections. Since 2009 our work has been shown in 10 exhibitions (Graphic Design Museum, San Diego Museum of Contemporary Art, gallery@calit2, and other venues.)

Cultural Analytics research is supported by the National Science Foundation (NSF), the National Endowment for the Humanities (NEH), National Energy Research Scientific Computing Center (NERSC), University of California Humanities Research Institute (UCHRI), University of California, San Diego (UCSD), California Institute for Telecommunications and Information Technology (Calit2), and the Singapore Ministry of Education.

In summer 2011, we are releasing fully documented tools we developed as open source to make it easier for others in digital humanities to work with image data.

ImagePlot tool for exploring large image and video collection - coming this summer

We are finishing documentation and getting the last bugs out of ImagePlot, the free open-source software tool which we have been using in our lab to explore patterns in large image and video data sets.

You can find hundreds of examples of visualizations created with ImagePlot on Flickr.