Exploring one million manga pages on the 287 megapixel HIPerSpace

I wrote this new description of what we do in our lab and how this relates to "cultural analytics," "software studies" and "digital humanities." Comments are welcome - email me at manovich at ucsd dot edu. Also, if you come across some interesting and easy to grab cultural data sets which we should analyze, let me know.

Thanks all,

Cultural Analytics is the term we coined to describe computational analysis of massive cultural and social data sets and data flows. Over last 15-10 years, cultural analytics came to structure contemporary media universe, cultural production and consumption, and cultural memory. Search engines, spam detection, Netflix and Amazon recommendations,, Flickr "interesting" photo rankings, movie success predictions, tools such as Google Books Ngram Viewer, Insights for Search, Search by Image, and and numerous other applications and services all rely on cultural analytics. This work is carried out in media industries and in academia by researchers in data mining, social computing, media computing, music information retrieval, computational linguistics, and other areas of computer science.

As humanities and social science researchers start to apply computational techniques to large data sets in their fields (see Digging Into Data 2011 competition), many questions arise. What are the new possibilities for studying culture and society made possible by "big data"? Do humanists and social scientists need to develop their own methodologies for working with big data? What is "data" in the case of interactive media? How can new computational methods can be combined with more established humanities approaches and theories? Is it possibly to study massive media sets without in-depth technical knowledge?

Our research at Software Studies Initiative (established 2007) focuses on exploring such questions. We believe that they can only be productively addressed using "software studies" approach, i.e. in depth understanding of software technologies behind cultural analytics.

Our second focus is the development of methods and intuitive visual techniques for exploration and analysis of large visual data sets including images, video, and digital media. (In fact, we are the first digital humanities lab to specifically target visual media analytics.)

We created a set of open source software tools which cover all the parts of such work: data preparation, feature extraction (using digital image analysis), interactive visual exploration, and rendering of high res still and animated visualizations. Our software works on regular laptops and desktop computers, next-generation large-scale displays such as HIPerSpace visual supercomputer with the combined resolution of 42,000 x 8000 pixel. We use open-source technologies whenever possible in our development.

To date, we have already successfully applied our techniques with films, animations, video games, comics, print publications, artworks, photos, and other media content. Examples include all pages of Science and Popular Science magazines published in 1872-1922), hundreds of hours of videogame recordings, all paintings by van Gogh, and one million manga pages. For details, see the Cultural Analytics projects page; in addition, you can also find many more visualizations on Flickr and YouTube.

Our past and present collaborators include Getty Museum, Wikimedia, Austrian Film Museum, Magnum Photos, Netherlands Institute for Sound and Image and other institutions who are interested in using our methodology with their media collections. Since 2009 our work has been shown in 10 exhibitions (Graphic Design Museum, San Diego Museum of Contemporary Art, gallery@calit2, and other venues.)

Cultural Analytics research is supported by the National Science Foundation (NSF), the National Endowment for the Humanities (NEH), National Energy Research Scientific Computing Center (NERSC), University of California Humanities Research Institute (UCHRI), University of California, San Diego (UCSD), California Institute for Telecommunications and Information Technology (Calit2), and the Singapore Ministry of Education.

In summer 2011, we are releasing fully documented tools we developed as open source to make it easier for others in digital humanities to work with image data.