new article: "Media Visualization: Visual Techniques for Exploring Large Media Collections"

Thank you everybody for your birthday wishes !

DOWNLOAD my new article:

Lev Manovich. Media Visualization: Visual Techniques for Exploring Large Media Collections.

The article presents the theory and the techniques of media visualization used in our lab, with the analysis of the examples.


Lev Manovich and Jeremy Douglass discuss media visualizations of one million manga pages at the opening panel for Mapping Time exhibition, gallery@calit2.

Our new software tools described in the article:

To learn how to use montage and slice visualization techniques with your datasets, use this doc:
Visualizing image and video collections: tutorials

Also check our our ImagePlot (released 9/2011):

Visualizing image and video collections: examples, tutorials, software, theory

You can also download this 28-page PDF document from our server. It illustrates three key techniques used in our lab ( to explore large image and video sets: montage, slice, and image plot.(released 3/30/2012).

To learn how to use montage and slice visualization techniques with your datasets, follow this guide:
Visualizing image and video collections: tutorials (released 3/30/2012).

ImageMontage (released 3/30/2012)
ImageSlice (released 3/30/2012)

Lev Manovich. Media Visualization: Visual Techniques for Exploring Large Media Collections. (released 6/2/2011; updated 3/31/2012). Article about the theory and methods of media visualization, with the analysis of the examples.

More information about our digital humanities projects:

How and Why Study Big Cultural Data - Lev Manovich's lecture, March 2012

Data Mining and Visualization for the Humanities symposium, NYU, March 19, 2012

Data Mining and Visualization for the Humanities

Monday, March 19, 2012 1:00 PM - 6:30 PM

NYU Open House, 528 La Guardia Place, NYC


1:00-1:15 Opening Remarks
Mara Mills, Assistant Professor of Media, Culture, and Communication (NYU)

"Visualizing Social and Semantic Space in Brazilian Literature, 1840s-1890s"
Zephyr Frank, Associate Professor of Latin American History and
Director of the Spatial History Project, Stanford University

2:30-2:45 Break

"Text as Data"
Mark H. Hansen, Professor of Statistics, University of California-Los Angeles

4:00-4:15 Break

"Why and How Study Big Cultural Data?"
Lev Manovich, Professor of Visual Arts and Director of the Software Studies Initiative, University of California - San Diego

5:30-6:00 Open Discussion
Moderated by Lisa Gitelman, Associate Professor of English and of Media, Culture, and Communication (NYU)

6:00-6:30 Reception

Computational humanities vs. digital humanities

Bluefin Labs analyze 5 billion of online comments and 2.5 million minutes of TV every month.
This visualization shows the relations between Gatorade brand and the male viewers of different TV shows.
Source: Bluefin Mines Social Media To Improve TV Analytics, Fast Company, 11-07-2011.

Echonest offer information on 30 million songs and 1.5 million music artists.

Paul Butler's visualization of a sample of 10 million friends from Facebook, using company' data warehouse.
Source: Paul Butler, Visualizing Friendships.

In a article called Computational Social Science (Science, vol. 323, no. 6, February 2009, the leading researchers in network analysis, computational linguistics, social computing, and other fields which now work with large data write:

"The capacity to collect and analyze massive amounts of data has transformed such fields as
biology and physics. But the emergence of a data-driven 'computational social science' has been much slower. Leading journals in economics, sociology, and political science show little evidence of this field. But computational social science is occurring in Internet companies such as Google and Yahoo, and in government agencies such as the U.S. National Security Agency. Computational social science could become the exclusive domain of private companies and government agencies. Alternatively, there might emerge a privileged set of academic researchers presiding over private data from which they produce papers that cannot be critiqued or replicated. Neither scenario will serve the long-term public interest of accumulating, verifying, and disseminating knowledge."

Substitute the word humanities in the above paragraph, and it now describes perfectly the issues involved in large-scale analysis of cultural data. Today digital humanities scholars are mostly working with the archives of digitized historical cultural archives which were created by libraries and universities with the funding from NEH and other institutions. These archives and their analysis is very important - but this work does not engage with the massive amounts of cultural content and peoples' conversations and opinions about it which exist on social media platforms, personal and professional web sites, and elsewhere on the web. This data offers us unprecedented opportunities to undertand cultural processes and their dynamics and develop new concepts and models which can be also used to better understand the past. (In our lab, we refer to computational analysis of large contemporary cultural data as cultural analytics.)

Contemporary media and web industries are dependent on the analysis of this data. This analysis enables search, recommendations, video fingerprinting, identification of trending topics, and other crucial functions of their services. Because of its scale and technical sophistication, perhaps we should call it "computational humanities." The players in computational humanities are Google, Facebook, YouTube, Bluefin labs, Echonest, and other companies which analyze social media signals (blogs, Twitter, etc.) and the content of media on social networks. They do not usually ask theoretical questions which can be directly related to humanities, but the types of analysis they perform and the techniques they use can be easily extended to ask these questions.

The questions posed in the paragraph I quoted above are directly applicable to "computational humanities." We can ask: Will computational humanities remain the exclusive domain of private companies and government agencies? Will we see a privileged set of academic researchers presiding over private data from which they produce computational humanities papers that cannot be critiqued or replicated?

These questions are essential for the future of humanities. In this respect, NEH/NSF Digging Into Data competitions are very important as they try to push humanists to think on the scale of computational humanities, and collaborate with computer scientists. To quote from the description of 2011 competition:

"The idea behind the Digging into Data Challenge is to address how "big data" changes the research landscape for the humanities and social sciences. Now that we have massive databases of materials used by scholars in the humanities and social sciences -- ranging from digitized books, newspapers, and music to transactional data like web searches, sensor data or cell phone records -- what new, computationally-based research methods might we apply?"

In our lab, we are hoping to make a contribution towards bridging the gap between "digital humanities" and "computational humanities." Our data sets range from the small historical datasets - for instance, 7000 year-old stone arrow heads and paintings of Piet Mondrian and Mark Rothko - to large scale contemporary user-generated content such as 1,000,000 manga pages or 1,000,000 images from deviantArt (the most popular social network for user-generated art). We also write papers for both humanities and computer science audiences. All our work is collaborative, involving students in digital art, media art, and computer science. And although our largest image sets are still tiny in comparison to the data analyzed by the companies I mentioned above, they are much bigger than what humanists and social scientists usually work with. The new visualization tools we have developed already allow you to explore patterns across 1,000,000 images, and we are gradually scaling them up.

Visualizing newspapers history: The Hawaiian Star, 5930 front pages, 1893-1912

The Hawaiian Star, 5930 front pages, 1893-1912 (Vimeo)

Last September I met with Leslie Johnston (Chief of Repository Development at Library of Congress). We discussed how my lab and the students in my classes can start working on visualizing significant digital archives available though the Library web site.

We both agreed that the digitized archive of American newspapers created by The Library via a partnership with by National Endowment of Humanities is a good place to start. Currently the archive contains 4,776,214 pages, and it continues to grow. The pages are digitized at 400 dpi resolution.

A group of UCSD undergraduate students who were taking my 2011 Fall class on big cultural data, visualization and digital image processing (current syllabus) figured out how to download high-res images of newspaper images and metadata using Library API, and started working on visualizing a number of newspapers. We will be putting a page with this project's results on soon. Today we are releasing one of the animated visualization created by UCSD undergraduate student Cyrus Kiani (embedded on the top of the post).

Kiani's animation uses 5930 front pages from The Hawaiian Star covering 1893-1912 period. This period is particularly important for the development of modern visual communication (development of abstract art which leads to modern graphic design, the introduction of image oriented magazines such as Vogue, new medium of cinema, invention of phototelegraph, the first telefax machine to scan any two-dimensional image, etc.)

The animation of 5930 front pages of the single newspaper published during these 20 years for the first time make visible how visual design of modern print media changes over time, in search of the form appropriate to the new conditions of reception and new rhythm of modern life.

Here is Kiani's other visualizations and analysis of this data set (click on the image to view original version (3000 x 10650 pixels) on Flickr:

The Hawaiian Star, 5930 front pages, 1893-1912 (width = 1000 pixels)

Visit to see our other digital humanities projects, and to download the free software tools we developed for visualization of large image archives.

The state of Wikipedia infographic

Jen Rhee's infographic - How Wikipedia redefines how people do research.

my favorite data point: %56 of students will halt research if little information is found on wikipedia.


more museums put their collections online

A screenshot of SFMOMA ArtScope interface to their image collection developed by Stamen Design.

When we started our lab in 2007, we expected that in a few years massive sets of images of artworks (and other areas of visual culture) will start become available from major cultural institutions. So we focused on developing methods and techniques for their analysis (what we call cultural analytics), while waiting for these collections to become available. The wait is almost over - these collections are here. (Unfortunately right now museum interfaces often tell web visitors how many "objects" they have in their database, but not how images, so some of the numbers below are approximate.)


MoMA - currently 33593 images online

Brooklyn Museum - probably 97,000 images online (their interface says that they have that many records but does not explain how many images they have)

SFMOMA - all 6,038 works in the collection are online:

BBC Your Painting - currently already 110,000 images online from UK national collections, with the aim to reach 200,000

Cleveland Museum of Art -
currently 65,000 images online.

Whitney Museum, NYC:
not clear how many images are online but looks like a lot

Powerhouse Museum, Sydney - somewhere between 98,000 and 170,000 images online (their interface does not explain how many images they have)

More and more museums offer their data via APIs:

List of museums offering API

The first museum API was developed some time ago by Powerhouse Museum - currently it offers access to 90,000 thumbnails:

new Vroom scalable display is ready for cultural analytics research

New super high resolution scalable display installed in Calit2 Vroom. The system consists from 32 individual displays arranged in 4 x 8 configuration.

The very thin bezels of individual screens make the system suitable for exhibitions and adoption in museums and other cultural venues. To test capabilities of this new system, we loaded our complete image set of 4535 Time magazine covers (1923-2009). Using our software, we can sort the images set according to different criteria. We can also select any single image or a group of images, and enlarge for a closer look. We are looking forward to using the new system for our cultural analytics projects.