"Data-driven humanities: analyzing 1,000,000 manga images" lecture, December 3, 5pm

what: lecture by Lev Manovich, Jeremy Douglass and William Huber
where: auditorium at Calit2 (first floor of Atkinson Hall, UCSD)
when: Friday December 3, 5 - 6pm.

followed by reception starting at 6pm.


Data-driven humanities: analyzing 1,000,000 manga images


Lev Manovich, Jeremy Douglass and William Huber show how they use computational tools to investigate patterns in a collection of 1,000,000 images. Each image is a single manga page. While the project is uncovering fascinating things about manga culture, its overall goal is to understand how digital image analysis and interactive supervisualization of "large cultural data" can be used in the humanities and social sciences, including art history, communication,
film and media studies, game studies, and sociology. The lecture will discuss the project's key findings and demonstrate practical techniques and software tools developed by the lab.

"One million manga images" is one of several projects by
Software Studies Initiative included in the exhibition Mapping Time: Visualizations of temporal patterns in media and art at the gallery@calit2 (UCSD campus, La Jolla, California) which runs until December 12.

More information:

Photographs from the lecture on Flickr.

Gallery 6

Gallery 11

Gallery 5

Science and Popular Science magazines, 1872-2007

William Huber, Tara Zepel, Lev Manovich. 2010.

This project explores changing strategies in use of images, layout, and content of two magazines: Science (1880-) and Popular Science (1872-). By arranging thousands of magazine pages into s single high resolution image, we are able to reveal gradual temporal changes over long historical periods.

This visualization shows evolution of Popular Science over 125 years (1882-2007):

Data: Popular Science magazine - one issue per every five years from the beginning of magazine publication.
Timescale: 1882 to 2007.
Mapping: The issues are arranged in the order of publication (left to right, top to bottom).

Now lets zoom and compare the first few decades of Time and Popular Science magazines. At first, Science includes photographs and hand crafted illustrations. These images are the legitimate parts of the process of creating scientific knowledge. However within about 10 years they disappear almost completely. The only images that are now generally permitted are graphs: illustration and photo documentation are increasingly treated as a way of communicating the work of science, rather than belonging to the work of science itself. In the last decades of the 19th century scientists make new discoveries that are translated into key technologies of modern society (electricity, wireless communication, etc.) These technologies and the models that inform them are less about understanding the visible and increasingly about the knowledge of, and the explanatory power of, the invisible. Visualization of 9801 pages of Science reflects this increasing importance of the invisible, and the relegation of the visual to explanation.

Science magazine 1880-1906
Data: All issues of Science magazine from the beginning of publication in 1880 to 1906. Our visualization uses every 3rd page of every issue (9801 pages total).
Timescale: 1880-1906.
Mapping: The pages are arranged in the order of publication (left to right, top to bottom).

In the first three decades of its publication, Popular Science used very few images. In fact, if we compare Science and Popular Science in the 1880s, we discover that the latter was at first more “scientific.” While photographs and illustrations accompanied Science articles, Popular Science used only occasional graphs. Over time the two magazines reverse their visual strategies. Science banishes photographs and illustrations as they come to be considered inappropriate for proper scientific discourse. Popular Science moves in reverse direction becoming highly visual. Mapping 9900 magazine pages into a single high-resolution image reveals this transition. The animation circle around a part of this image that contains issues published in the first two decades of the 20th century. The change in magazine ownership in 1912 dramatically manifests itself in the sudden jump in the number of images and ads and new layout strategies. However, when we zoom out to see the whole visualization, we notice that this change was already anticipated by the gradual increase in the number of images during the preceding decade.

Popular Science magazine 1872-1922
Data: All issues of Popular Science magazine from the beginning of publication in 1872 to 1906. This visualization uses every 3rd page of every issue (9900 pages total).
Timescale: 1872-1922.
Mapping: The pages are arranged in the order of publication (left to right, top to bottom).

Related projects:
4535 covers of Time magazine, 1923-2009.


Visualizations of Science and Popular Science in the exhibition Here Not There: San Diego Art Now (Summer 2010) at Museum of Contemporary Art San Diego (MCASD).

The Evolution of Popular Science Magazine over the Past 125 Years, infosthetics.com, 11.22.2010.

A Century of Popular Science, gizmodo.com, 11.23.2010.

125 Glorious Years of Popular Science in One Giant Picture, popscience.com, 11.23.2010.

Popular Science As Seen Through the Ages, vizworld.com, 11.23.2010.

Popular Science escribió la historia, neoteo.com (leading online technology review in Spanish).

125 ปี Popular Science ในภาพๆ เดียว, thaizad.com, (popular online technology review in Thailand).


very good set of infographics by GOOD magazine:


New York Times: the next big idea in humanities is data

PATRICIA COHEN, Digital Keys for Unlocking the Humanities’ Riches, New York Times, November 17, 2010:

"A history of the humanities in the 20th century could be chronicled in “isms” — formalism, Freudianism, structuralism, postcolonialism — grand intellectual cathedrals from which assorted interpretations of literature, politics and culture spread.

The next big idea in language, history and the arts? Data.

Members of a new generation of digitally savvy humanists argue it is time to stop looking for inspiration in the next political or philosophical “ism” and start exploring how technology is changing our understanding of the liberal arts. This latest frontier is about method, they say, using powerful technologies and vast stores of digitized materials that previous humanities scholars did not have."

I did informal survey on FB to see is the term "digital humanities" is already employed in various countries. The results are quite interesting...

One million manga pages


Jeremy Douglass, William Huber, Lev Manovich. "Understanding scanlation: how to read one million fan-translated manga pages." In Image and Narrative, winter 2011. [pdf 3 MB].

Lev Manovich. "How to Compare One Million Images?" In David Berry, ed., Understanding Digital Humanities (Palgrave, 2012).

Keywords: manga, comics, Naruto, visualization, cultural analytics, softwarestudies.com, softeware studies, digital humanities, Calit2, UCSD, NERSC, HIPerSpace, Cultural Analytics, Lev Manovich, Jeremy Douglass, William Huber

"The humanities with some heavy iron...compared to other scholarly attempts to analyze Japanese comics — well, *gasp, choke, Good Lord!* Lookah that thing! It’s like some terrifying splash panel from vintage EC comics." Bruce Sterling, a blog post about our Manga visualization (see below), November 14, 2010, wired.com.

Manga visualization on 287 megapixel HIPerSpace
Using a unique 287 megapixel HIPerSpace at Calit2 (San Diego) for Manga research.

As soon as new Manga books are published in Japan, fans buy them, scan the pages, translate the text into other languages, and distribute digital images of the translated pages via web sites. In the process, they also insert additional pages (group credits, commentaries, and original fan art). This process is referred to as “scanlation.” Until July 2010, the most popular online archive of the “scanlations” was OneManga.com. (It was also among most visited web sites in general - 300th in the U.S., and in the top 20 in Singapore and Malaysia.)

In the Fall 2009, we have downloaded 883 Manga series containing 1,074,790 unique pages from this site. We then used our custom software system running on a supercomputer at National Department of Energy Research Center (NERSC) to analyze visual features of these pages (funded by Humanities High Performance Award from NEH Digital Humanities Office.) To match the scale of the data, we are using 287 megapixel The Highly Interactive Parallelized Display Space (HIPerSpace) with a a custom software for interactive exploration of large image sets developed between our lab (softwarestudies.com) and Gravity lab at Calit2.

Visualization below shows our complete data set - 1 million Manga pages organized in 2D space according to their visual characteristics. The pages in the bottom part of the visualization are the most graphic (they have the least amount of detail). The pages in the upper right have lots of detail and texture. The pages with the highest contrast are on the right, while pages with the least contrast are on the left. In between these four extremes, we find every possible stylistic variation.

(Note: some of the pages - such as all covers - are in color. In order to be able to fit all pages into the largest possible image our software could render and save, we made this rendering in grey scale. Also note that because pages are rendered on top of each other, you don't actually see 1 million of distinct pages - the visualization shows a distribution of all pages with typical examples appearing on the top.)

Manga Style Space
Manga style space.
Data: 883 Manga series from the scanlation site OneManga.com. Total number of pages: 1,074,790.
Mapping X axis: A mean of standard deviation of pixels’ grayscale values in a page. Y axis: A mean of entropy measured over all pixels’ grayscale values in a page.

What do we learn from this visualization? It suggests that the very concept of style as it is normally employed becomes problematic then we start analyzing large cultural data sets. The concept assumes that we can partition a set of works into a small number of discrete categories. However, if we find a very large number of variations with very small differences between them (such as in this case of 1 million Manga pages), it is no longer possible to talk about "style" of these works. Instead, it is better to use visualization and/or mathematical models to describe the space of possible and realized variations.

What about single manga titles? is it meaningful to talk about a title's style? As we found out, often even a short title has such graphic variability that we also can't use "style" concept. Here is an example of such title ("Anatolia Story"). 879 pages are organized by brightness mean (X) and entropy (Y):


In these examples, manga pages are organized according to particular visual features. Taking into account other features and also higher-order attributes (content, composition, manga's visual conventions for rendering characters, their faces, backgrounds, etc.) may reveal the presence of distinct stylistic styles in the one million pages sample, and also show more stylistic coherence in individual manga titles. We are hoping to investigate these questions in near future.


You can find a discussion of the initial results of our work with Manga data set in this research report:
Style Space:
 Analysis and Visualization of 1 million Manga pages (06/2010). [key 56.6 MB]. [ppt 17.9 MB]

More manga and comics visualizations created by Software Studies Initiative (Flickr)

Daniel H. Pink. Japan, Ink: Inside the Manga Industrial Complex, Wired 10.22.07.

Anna Karenina

Lev Manovich. 2009.

While recent advances in computing open new possibilities for visualizing patterns in cultural artifacts, we can find many important precursors created much earlier without use of computers. For example, beginning in the early 1960s, many media and later new media artists have been restructuring TV programs and films in a variety of ways (slowing down, speeding up, sampling and repeating, etc.) to reveal ideological and formal patterns in this media content. Another relevant practice is the use of diagrams by artists, choreographers, architects, composers and others to plan and analyze their own works.

We can also find interesting precursors in print culture. For instance, a familiar book index can be understood as a visualization technique. Looking at the book index you can quickly see if particular concepts or names are important in this book – they will have more entries than the concepts that take up only a single line in the index.

This visualization of the complete text of Anna Karenina is inspired by a common reading practice of underlining important lines and passages in a text using magic markers. To create this visualization I wrote a program that reads the text from a file and renders it in a series of columns running from top to bottom and from left to right as a single image it also checks whether text lines contain particular words (this version checks for the word Anna) and highlights the found matches.

(Note: Because Flickr limits maximum resolution and file size of photos it can show, I had to break the visualization into two parts.)

Anna Karenina 1/2
Anna Karenina 2/2
Data: The compete text of Lev Tolstoy’s Anna Karenina (English translation: Constance Garnett). Number of words: 351,000. Number of lines: 43,200.
Timescale: The novel was published in serial installments from 1873 to 1877 in the periodical The Russian Messenger.
Mapping: The text of the novel is arranged on a single page. Reading order: top to bottom, left to right.

more visualizations of Anna Karenina and Hamlet (Flickr)

cultural analytics in 6 min (Mapping Time exhibition video)

Mapping Time exhibition opening panel

"Mapping Time" exhibition opens at gallery@calit2

Visualizations of temporal patterns in media and art


Exhibition by Lev Manovich, Jeremy Douglass, William Huber

With: Adelheid Heftberger, Agatha Man, Alex Avrorin, Bertrand Grandgeorge, Bob Li, Chanda L. Carey, Christa Lee, Christine Pham, Colin Wheelock, Daniel Rehn, Devon Merill, Jia Gu, Kedar Reddy, Laura Hoeger, Michael Briganti, Nichol Bernardo, Ong Kian Peng (aka Bin), Rachel Cody, Sergie Magdalin, So Yamaoka, Steven Mandiberg, Sunsern Cheamanunku, Tara Zepel, Victoria Azurin, Xiangfei Zeng, Xiaoda Wang.

October 4 - December 10, 2010
University of California, San Diego

Panel discussion and opening reception was held on October 22, 2010. (Photos on Flickr.)
Film screening and closing reception: December 3, 2010.

This fall, the gallery@calit2 presents "Mapping Time," an exhibition by the Software Studies Initiative. Since 2008, the Software Studies Initiative has focused on the development of new methods and techniques for the analysis and visualization of visual and interactive media. The exhibition coincides with the lab releasing a number of open-source tools which were used to create all works in the exhibition.

The lab is directed by Lev Manovich, UCSD Professor of Visual Arts and Calit2 researcher; its core participants are Jeremy Douglass (Calit2 post-doctoral researcher) and William Huber (PhD candidate in visual arts). In addition, undergraduate and graduate students and faculty from the departments of Electrical and Computer Engineering, Computer Science and Engineering, Communication, Visual Arts, the Center for Research in Computing and the Arts, and Calit2 participate in the lab's work.

The lab uses the term Cultural Analytics to refer to its techniques for the analysis and visualization of large cultural data sets. For the "Mapping Time" exhibition, the concept is to render the "shapes" of cultural time. According to Manovich, "our goal is to demonstrate how we can visualize gradual changes over time at a number of scales - from a single minute of a video game play, to 11 years of the popular manga title Naruto, to 130 years of the journal Science (1880-2010).” The exhibition includes visualizations of novels, video game play, web comics, manga, motion graphics, feature films, and mass media publications such as Time magazine presented via large-scale prints, animations and real-time generative projections.

Software Studies Initiative research is supported by Calit2, CRCA, UCSD, the NEH Office of Digital Humanities and NSF.

Software Studies Initiative contact: William Huber <whuber@ucsd.edu>.
Calit2 media contact: Doug Ramsey <dramsey@ucsd.edu>.


steps towards real-time global cultural analytics

When we started Software Studies Initiative in 2007, one of our first steps was to imagine an interface for exploring real-time cultural trends on a a giant 287 megapixel HIPerSpace display constructed at UCSD.

Cultural Analytics research environment: geo view

Our interface remained only a design until now - however, now we are seeing a growing number of visualizations which measure some social media activity and create rudimentary maps, or simply list trending keywords (for instance, twitter web site itself) - little steps towards a future much richer map which can be actually useful.

A WORLD OF TWEETS project by frogdesign stands out from the rest - the dashboard around visualization offers useful information and you can actually start seeing interesting trends. (It also shows that now that everybody has access to the APIs and data they provide, its good designers who will make better visualizations):


Countries with the most tweets (Since 1st Nov 2010):


Patterns observed:

- parts of the US are completely silent
- Europe is fully on (good job!)
- all other clusters of tweet activity are on or relatively close to the ocean coasts
- compare this map to another wold map made from geolocations of 35 million Flickr photos - which all cluster in coastal areas :

(David Crandall, Lars Backstrom, Dan Huttenlocher, Jon Kleinberg. Mapping the World's Photos. WWW 2009):

More data from A WORLD OF TWEETS:

Top 4 Asian countries are:

JAPAN 10.76%

Patterns observed:

While doing research on the global Manga interest using Google Insight (which analyzes the amount of Google searchers for particular keywords), I also noticed that Indonesia and Malaysia were the top countries in the world in terms of number of searches for manga.

The following are the screen grabs from Google Insight:



PSP on Sourcemap

PSP on Sourcemap

Art Objects as Data Points | 200,000 images in UCSD art library

Gu, Jia. Art Objects as Data Points.
Presentation at Calit2 Summer Researchers Poster Session. 2008 (pdf)