Style space: How to compare image sets and follow their evolution (part 4)

This is part 4 of a four part article.
Part 1, Part 2, Part 3.

Text: Lev Manovich.
All visualizations are created with free open-source ImagePlot software developed by Software Studies Initiative. The distribution also includes a set of 776 images of van Gogh paintings, and the tools that were used to measure their image properties.


I introduced "Style space" concept to suggest that a "style" of a particular set of cultural artifacts is not a distinct point or a line in the space of possible expressions. Instead, it is an area in this space.

In joining words "style" and "space" together, I wanted to evoke this image of an extension, a range of possibilities. Rather than imagining an artist's development like a road moving through a hilly landscape, let's think of it instead as a cloud that gradually shifts above this landscape over time. This cloud may have different densities in different regions and its shape may also be changing as the artist develops. And just like with real clouds, our cloud can't just suddenly jump from one area to another; in the overwhelming majority of cases, cultural evolution proceeds through gradual slow adjustments. (While we may expect to find some special cases of sudden changes, so far all the cultural data sets we looked at in our lab display gradual changes.)

For example, consider this visualization of 776 paintings by Vincent van Gogh we have in our data set. (We are distributing this image set along with ImagePlot software used to make all visualizations in this article.)

X-axis = dates (year and month).
Y-axis = median brightness.


Regardless of what period we may select - spring 1887, summer 1888, all paintings done in Paris (4/1886 - 3/1886), all paintings done in Arles (3/1888 - 3/1889), etc. - their average brightness values cover a significant range.

The visualizations above use brightness median, but the same holds true for any visual feature: brushstrokes character, shapes, contrast, composition, etc. For example, here the visualization that uses median saturation:

X-axis = year and month.
Y-axis = median saturation.


Earlier we use metaphor of a cloud to describe a style. We can actually visualize this cloud if we increase the size of images in a visualization, and use transparency:

X-axis = year and month.
Y-axis = median brightness.


Of course, we are not limited to tracking values of single features. Here are the two visualizations that compare van Gogh Paris paintings to his Arles paintings using two features: average brightness and average saturation:

X-axis = median brightness.
Y-axis = median saturation.
Paris period (4/1886 - 3/1886): 199 paintings.
Arles period (3/1888 - 3/1889): 161 paintings.


These visualizations show that in regards of the two features used (average brightness and average saturation), the difference between two periods is only relative, rather than absolute. The center of the "cloud" of Arles painting is displaced to the left (brighter), and to the top (more saturated) in comparison to the "cloud" of Paris paintings; it is also smaller, indicating less variability in Arles paintings. But the larger parts of the two clouds overlap, i.e. they cover the same area of the style space.

To summarize this discussion:

1) Values of visual features that characterize "style" within a particular time period typically cover a range.

2) The values typically shift over time in a gradual manner. This means that in any new "period" we may expect to see some works that have feature values that did not occur before, but also works with features values that already were present previously (i.e. these are works in "old style.")

For instance, if look at the lowest band of images in the visualization above which uses median brightness, you will notice that between 8/1884 and 9/1885 van Gogh produced many really dark paintings. You may expect that after he moves to Paris where, to quote Vincent van Gogh museum web site, "His palette becomes brighter," these dark works will disappear, but this is not true. Along with very light paintings similar in values to impressionists's works produced around the the same time, van Gogh still sometimes makes the paintings which are as dark as the ones he favored in 1884-1995. And then later, in Arles, he still ocassionally "regresses" to his dark style. The same applies to to highest vertical band where van Gogh's lightest paintings lie. While most of these works were done after 1886, a few can be also found earlier.

This can be clearly seen in the following histograms of median brightness values of van Gogh's paintings divided into three periods that correspond to places where van Gogh lived and worked (this is a common way to divide artist's work - you can find on both Wikipedia page about van Gogh and on Vincent van Gogh museum web site). Each histogram shows the distribution of brightness values; the values are arranged in increasing brightness left to right.

Top histogram: Etten, Drenthe, The Hague, Nuenen, Antwerp. 11/1881 - 4/1886. 196 images.
Middle histogram: Paris. 4/1886 - 3/1888. 199 images.
Bottom histogram: Arles. 3/1888 - 4/1889. 161 images.


We can use various techniques to characterize the movement of feature values over time. For instance, we can fit a line or a curve through the all points.

Data: 776 images of van Gogh paintings, 1881-1890.
X-axis: paintings dates (year and month).
Y-axis: median brightness.


The following plots use fit curves to seven features of van Gogh paintings (brightness median, saturation median, hue median, brightness standard deviation, saturation standard deviation, hue standard deviation, number of shapes) plotted on Y axis against paintings dates (X-axis):



Or, we can divide the paintings into temporal periods (months, seasons, etc.) and calculate measures of central tendency and variability for each period. (Mean and median are popular measures of central tendency; standard deviation is the popular measure of variability.) This will tell us both how the center of a style "cloud" shifts over time, and also how wide or narrow it is in any period. Here are these measures for a few features; the "periods" correspond to the places where van Gogh worked (note that our data set contains 776 images; its estimated that van Gogh produced the total of 900 paintings.)

776 van Gogh paintings - selected features averages

These and similar techniques allow us to describe the overall patterns of change. However, all such descriptions are "constructions" - idealized representations of real processes. The values through which fit curve passes, or the mean values for places may not correspond to the actual values of features any particular painting.

Only if we select a single painting for each period, we can draw a definite "real" line through them. But this procedure reduces artist works to a few "masterpieces," disregarding the rest. (Of course, this is often how art functions today: if you search for "Vincent van Gough" using Google Image Search, you will see hundreds of images of the same few paintings, and very few images of all his other paintings.)

PP.S. To be clear - a set of values of particular features do not completely describe a style. First of all, even dozens of features may not capture all stylistic dimensions. Second, in my view a style is also defined by a set of associations between feature values. That is, certain feature choices are likely to occur together. For instance, in modernist graphic design of the 1920s-1950s, simple geometric forms, diagonal compositions, black and red colors, and sans serif fonts all go together. In Mondrian's later paintings, rectangular forms go along with white, black, and primary colors. This article does not deal with this aspect of style definition.

Networks and Network Analysis in the Humanities: 3 day conference at UCLA

Networks and Network Analysis for the Humanities

October 20 - 22, 2011
314 Royce Hall

Lev Manovich is presenting on Friday, October 21, 11:45-12:15.

Data Mining (a lecture by Lev Manovich, Copenhagen, November 11)

The Culture of Ubiquitous Information
Seminar 4 :
Invisibility and Unawareness: Ethico-political Implications of
Embeddedness and Surveillance Culture

Copenhagen, November 9 - 11, 2011

Friday, November 11, 10am
Danish Architectural Center, Strandgade 27, 1401 Copenhagen K

Lev Manovich

Data mining

Data mining is the application of statistical and artificial intelligence methods for computational analysis of large data sets and data streams (including surveillance data). Along with machine learning, it is the key intellectual technology used in our software societies to understand information, derive knowledge and make decisions. While it is important for all citizens of these societies to understand the basic principles involved in these technologies, it becomes even more important for digital humanists who are beginning to adopt computation for the analysis of large cultural data sets.

In my presentation I will discuss the key ideas underlying data mining and machine learning and their relations to the history of ideas about the nature of categories (classical view, family resemblance, prototype theory.) I will argue that while in the industry these technologies are commonly used to reify existing social and cultural categories, humanists should use them in the opposite way: to question these categories. I will show how this can work in practice using the example of one of the projects in our lab ( – analysis and visualization of 100 GB data set of one million manga (Japanese comics) pages.

two talks about Cultural Analytics at Mobility Shifts NYC, October 14-15

Mobility Shifts NYC conference full schedule

Friday, October 14

10:00 am - 12:30 pm
Wollman Hall, Eugene Lang Building, 65 West 11th St., 5th floor

Lev Manovich (Software Studies Initiative, UC San Diego)

Data Literacy and Cultural Analytics

The joint availability of numerous large data sets on the web and free tools for data scraping, cleaning, analyzing and visualizing enable potentially anybody to become a citizen data miner. But how do we enable this in practice? What are the necessary elements of “data literacy”? How do we inspire students in traditionally non-quantitative fields (art history, film and media studies, literary studies, etc.) to start playing with big data?

One the limitations of the existing popular data analysis and visualization tools is that they are designed to work with numbers and texts – but not images and video. To close this gap, In 2007 we have established Software Studies Initiative ( at University of California, San Diego. The lab’s focus in on development of new visualization methods particularly suited for media teaching and research. In my presentation I will show a sample of our projects including visualization of art, film, animation, video games,magazines, comics, manga, and graphic design. Our image sets range from 4535 covers of Time magazine to 320,000 Flickr images from “ArtNow” and “Graphic Design” groups, and one million manga pages.

In September 2011 we released ImagePlot - free software tool that visualizes collections of images and video of any size. I will discuss how we use Image Plot in classes with both undergraduate and graduate students to create collaborative projects which reveal unexpected cultural trends and also make us question our existing concepts for understanding visual culture and media.

Saturday, October 15

1:30-4:30 pm: Progressive Digital Pedagogy: Remix, Collaboration, Crowdsourcing
Wollman Hall, Eugene Lang Building 65 West 11th St., 5th floor

Elizabeth Losh (Sixth College, UC San Diego)

In recent years progressive digital pedagogy has borrowed from five major aspects of the popular culture developing around computational media: 1) remix practice, 2) multimodality, 3) accelerated response, 4) crowd sourcing, and 5) narrowcasting. Yet for many years the conventional classroom pedagogy around teaching “current events” has remained unchanged: it still generally focuses on having learners mechanically cut out recent news stories produced by traditional print journalists with little attention to how the news is made, how it remixes sources, how it appeals to particular audiences, or how particular patterns of visual imagery and verbal rhetoric could be analyzed critically. This talk focuses on recent work by the Software Studies initiative at U.C. San Diego by the Cultural Analytics group and shows how media visualization and crowd sourcing could be used in educational contexts with large publically accessible libraries of digitized news and smaller archives of government public information videos.

"Art Now" and "Graphic Design" Flickr groups (340,000 images)

Researchers: Todd Margolis, Jeremy Douglass, Tara Zepel, Lev Manovich.

Project start date: July 2011.

The availability of massive amounts of cultural content online allows us to start asking lots of interesting questions which were unthinkable before social media. For example: what is the "shape" of user-generated art? If we look at large enough sample of art images people upload, will see a number of distinct styles? Or will we see an endless variability? is there any significant differences between images which are labeled by their creators as "art"
and images from other areas of visual culture (graphic design, motion graphics, etc.)?

For this project we downloaded all images from two large Flickr groups: Art Now (apps. 170,000 images) and Graphic Design (also appr. 170,000 images).

Art Now (169,681 items)
A group for displaying, fostering awareness and discussing the emerging relevant art and artists of today.

Graphic Design (177,700 items)
Anything from drawings you did in Paint to photoshoped images. If you made it, put it in the pool.

(These numbers are for the images which were available in these groups when we did our download in August 2011; as group members continue to add new images, the numbers continue to grow).

We used our custom software to process the images on the Macs in our lab, extracting 400 features from each image. The features characterize images' tones, colors, shapes, lines, texture, and other visual characteristics. The next step is to visualize two image sets according to these features.

Our initial research questions are:

- what kinds of clusters (based on visual characteristics) can be find
in each group?

- in what ways the first group and the second group overlap, in how they
are different?

- are there some specific properties of the images done with computer
graphics ? Are these properties the same for two sets?

Glitch Art Flickr Group

Todd Margolis and Tracy Cornish. 2011.

4000 images from Flickr Glitch Art Group.
X-axis = brightness mean. Y-axis = entropy.

Computer glitches are the completely random, unpredictable and unexpected failures of digital systems. They are the result of approximated values and computational compensations for inaccessible information. Unlike bugs or faulty programming which can be tracked back to errors in code, glitches are fleeting and are often the result of untraceable truncated data streams or rounded values. They interfere with the notion of perfect digital reproduction, and remind us of the constructed and transient nature of information.

Glitching is an expanding genre in visual arts/cultural theory, electronic music and gaming to refer to the practice of exploiting glitches. These short-lived faults are part of our contemporary experience – they are inextricably linked to our engagement with digital technology and information transfer. As contemporary cultural indicators, glitches have the potential to be employed as an entry point into the critique of post-digital culture.

The Glitch Art group on Flickr has over 3000 members with over 7000 images. Using Todd Margolis’ Flickr Harvester in combination with Cultural Analytics tools developed by the Software Studies Initiative at UCSD, Tracy Cornish has been investigating emergent patterns in this repository based on Flickr metadata as well as image features.