QTIP software: analyze image collections with the speed of the light

QTIP is a free digital image processing application. It was developed by Multimodal Analysis Lab and National University of Singapore) and Software Studies Initiative at University of California, San Diego.

Download it from software page of our lab blog.

Use it to process your image collection and then visualize the collection with our free ImagePlot tool.

Example: using QTIP with ImagePlot to compare 580 van Gogh paintings (left) and 580 Gauguin paintings (right). In each plot, X-axis = median brightness, Y-axis = median saturation.


Since 2008, Software Studies Initiative (softwarestudies.com) has been developing a computational approach to working with big visual data sets which we call cultural analytics. We automatically analyze image or video collections using digital image processing, and then use the results to render high resolution visualizations which show all images in a collection sorted according to their visual attributes and metadata. This allows us to efficiently "look inside" large visual collections, identifying patterns and seeing the whole "landscape" of the data.

After refining our software on internal projects, in September 2011 we released ImagePlot visualization tool. The download also includes ImageMeasure macro which analyzes basic visual features of images in a collection (brightness, saturation, hue and number of shapes of every image).

Today we are making available a more powerful free digital image processing application:

QTIP: analyze image collections with the speed of the light.

QTIP stands for QTImageProcessing. Written in Java, QTIP uses OpenCV and is very fast. Fire it up, select a directory containing images, and watch it go through it with an amazing speed. The output is a single file (.csv) containing image filenames and 49 extracted features per image.

For example: we downloaded 178,930 images from Flickr group "Graphic Design." On our iMac (Fall 2011) with 16 GB of RAM, the application processed all these images in 15 minutes (192 images per second). On Mac Air (Fall 2011, 4 GB RAM), processing the same 178,300 images took 28 minutes (106 images per second.)

Once you process your image collection with QTIP, use the results file with ImagePlot to explore the collection and make discoveries.


1. If you are processing a directory which contains lots of images, it may take a few seconds before you see QTIP beginning to read images - just wait.

2. QTIP can process multiple image directories located inside a user-specified input directory.

3. One of the most useful features output by QTIP is the number which indicates if a given image is color or black and white. For the description of this and all other features, read program documentation.

4. The results file contains two columns called "titleID" and "bookID." Disregard these columns. (The program was originally written to analyze our one million manga pages data set which was divided into titles and books.)

5. If QTIP encounters a corrupt image, it will still create a row for this image containing filename and an error flag. You need to remove these lines before using these results with ImagePlot. Do a sort on the filename using "bookID" column and then delete the lines which have error flag. In our experience, image sets donwnloaded form social media sites (Flickr, deviantArt) often contain some corrupt images.

world diary: how many photos do we take now?

From the talks at Digital Media Analysis, Search and Management workshop at Calit2, 2/27-2/28, 2012

talk 1:
Facebook - 3 billion photo uploads a month
Flickr - 5 billion photos in total
Flickr - %85 of photos have no tags or descriptions
Picassa - only %5 of photos contain tags
%60 of consumer photos contain faces

talk 2:
500 billion consumer photos taken per year
36 billion Facebook photos now
40K photos uploaded per minute

5-minute Guide: Graphic Design Principles for Information Visualization

Over last few years, Information visualization has become of the key contemporary communication medium and also research techniques. But unless you went to design school, how can you create good looking designs such as the ones on www.visualizing.org or infosthetics.com ?

(Note that I am not talking about visualization part itself - how to effectively translate data into visual representations, which visualization techniques to use then, etc. - but about graphic design part, i.e. how to “style” your visualizations.)

The purpose of this guide is to reduce modern design to seven essential "algorithms." Follow these principles and your design will look and communicate much better:

7 Principles for Designing Good-Looking Visualizations

Great visualization design: 2010 Annual Report by Nicholas Feltron.

an experiment on 283 million Facebook users: computational social science in action

Bakshy, Marlow, Rosenn, Adamic. The Role of Social Networks in Information Diffusion.

"At the time of the experiment [8/4/2010 - 10/4/2010] there were approximately 500 million Facebook users logging in at least once a month. The experimental population consisted from approximately 283 of these users."

Information diffusion in social networks: related research articles on Google Scholar.

Digital Media Analysis, Search and Management workshop at Calit2, Feb 27-28, 2012

Digital Media Analysis, Search and Management workshop at Calit2

latest schedule:

Location: Calit2 Auditorium, UCSD Campus, La Jolla, CA

WORKSHOP AGENDA (DRAFT February 1, 2012)


8:45 Workshop Welcome – Natalie Van Osdol, Pacific Interface Inc. (PII)
8:50 Welcoming Remarks – Junji Yamato, NTT Communications Science Laboratories (NTT CSL)

Metadata for Managing Digital Entertainment Media – Moderator: TBD
9:00 Production/Post Environment: Eric Dachs, PIX System
9:20 Digital Distribution for e-Commerce: Jim Helman, Motion Picture Laboratories
9:40 Digital Media Library & Archives: John Zubrzycki, BBC- TBC
10:00 Digital Media Security – Irdeto speaker TBD
10:20 Moderated Q&A

10:30 Break

Smart Mobile Media Platforms – Moderator: TBD
11:00 Technology Trends in Smart Mobile Media Platforms: Michelle Leyden Li Qualcomm - TBC
11:20 Kindle and Friends – David Foster, Amazon- TBC
11:40 Mobile Networks & Services for Smartphone Era – Masaaki Fukumoto, NTT DoCoMo
12:00 Smart Media Clouds – Amazon, Netflix or Microsoft - Speaker TBD
12:20 Moderated Q&A

12:30 Lunch

Smart Mobile Media Metadata – Moderator: TBD
2:00 Leaf Snap - Finding Botanical Similarity – Peter Belhumeur, Columbia Univ. VAP Lab - TBC
2:20 Leaf Snap demonstration
2:30 Document Identification – Prof. Koichi Kise, Osaka Pref. Univ.
2:50 Document Identification demonstration
3:00 Live TV Search: Motonobu Isoya, NTT Data
3:20 Live TV Search demo (need multiple live TV channels)
3:30 Moderated Q&A

3:45 Break

Demonstration Session in Vroom – Moderator: Laurin Herr, PII
4:15 New Techniques for Cinema/TV Post-Production Metadata Access - PIX, NTT, UCSD, UIC
5:00 Moderated Q&A

5:15 Adjourn


8:50 Workshop Welcome and Overview – Laurin Herr, Pacific Interface

Historical Review of Media Recognition Technologies – moderator Shin-ichi Sato, NII
9:00 Audio: Mark Plumbley, Queen Mary, University of London (remote VTC from UK)
9:20 Computer Vision: David A. Forsyth, UIUC - TBC
9:40 Moderated Q&A

9:50 Break

Unusuality Detection and Prediction – Moderator: TBD
10:20 Sifting Through Oceans of Sensor Data: Ron Johnosn, University of Washington, OOI - TBC
10:40 Multimodal Sensor Processing: Bernt Schiele, Max Planck Institute - TBC
11:00 Video/Multimedia Event Detection: John Smith, IBM Thomas J. Watson Research Center - TBC
11:20 Video Tracking Multi-camera Networked Systems – Andrea Cavallaro, Queen Mary, University of London - TBC
11:40 Finding Unusuality with Memory-Based Particle Filters – Dan Mikami, NTT CSL
12:00 Moderated Q&A

12:10 Lunch

Philosophical and Technical Aspects of Audio Similarity – Moderator: Kunio Kashino, NTT CSL
1:40 Large Scale Audio Search for Music Similarity: Malcolm Slaney, Yahoo - TBC
2:00 AXES Project for Media Similarity Search: Erwin ­­­­­Verbruggen & Roeland Ordelman, Netherlands Institute for Sound and Vision - TBC
2:20 Audio Similarity Assessment: Daniel Ellis, Columbia University - TBC
2:40 EchoNest: Extracting Musical Attributes, Brian Whitman, The Echo Nest - TBC
3:00 Moderated Q&A

3:10 Break

Philosophical and Technical Aspects of Video Similarity – Moderator: TBD
3:40 ISR Metadata Issues – Stephen Long, Northop Grumman
4:00 Finding Visual Synonyms using Image Net: Li Fei-Fei, Stanford University - TBC
4:20 Video-Concept Search using Media Mill: Cees Snoek. University of Amsterdam, ISLA
4:40 Mining People Attributes from Community-Contributed Photos for Activity Discovery and Recommendation: Winston H. Hsu, National Taiwan University
5:00 Moderated Q&A

Session Moderators’ Panel – Moderator: Laurin Herr, Pacific Interface
5:10 6 session moderators each give brief comments, then panel discussion, followed by Q&A
5:45 Adjourn

a list of Innovative visualizations of temporal processes

I put together this online resource containing particularly interesting visualizations of temporal processes and some tools for time visualization:


Three examples of innovative visualizations of temporal cultural processes
(Movie narrative chart, Listening Post, The Ebb and Flow of Movies):


Visualization as a Method in Art History - panel at College Art Association 2012 meeting, LA, 2/24/2012

Information Visualization as a Research Method in Art History
Friday, February 24, 2:30 PM–5:00 PM
West Hall Meeting Room 502A, Level 2, Los Angeles Convention Center

Christian Huemer, Getty Research Institute;
Lev Manovich, University of California, San Diego

Visualizing the Ecology of Complex Networks in Art History
Maximilian Schich, Northeastern University

Geoinformatics and Art History: Visualizing the Reception of American Art in Western Europe, 1948-1968
Catherine Dossin, Purdue University

Interactive Mapping of the Agents of the Art Market in Europe (1550-1800)
Sophie Raux, Université Lille Nord de France

Visualizing Art, Law, and Markets
Victoria Szabo, Duke University

Lithics Visualization Project for Analysis of Patterns and Aesthetic Presentation
Georgia Gene Berryhill, University of Maryland

Information Visualization and Museum Practice
Piotr Adamczyk, Google and University of Illinois, Urbana-Champaign

New York Times: The Age of Big Data

Yet another article on big data - this one in the Sunday edition of New York Times:

Steve Lohr. The Age of Big Data. New York Times, February 12, 2012.

My favorite quotes from the article:

"“It’s a revolution,” says Gary King, director of Harvard’s Institute for Quantitative Social Science. “We’re really just getting under way. But the march of quantification, made possible by enormous new sources of data, will sweep through academia, business and government. There is no area that is going to be untouched.”

"Data is in the driver’s seat. It’s there, it’s useful and it’s valuable, even hip."

Definition of a data scientist

Do you enjoy finding patterns in big data?
Do you want the opportunity to apply the latest AI algorithms to real data?
Do beautiful data visualizations get you excited?
Do you want to have millions of people use your code every day?
Do you consider yourself an innovator? Tinkerer? Dreamer?

(Quoted rom the job description Data Scientist - Trulia Data Science Lab)

Wolfram|Alpha Pro: Launching a Democratization of Data Science

At Software Studies Initiative, we have been working to democratize the use of digital image processing for exploring large image and video collections.

In September 20111, we released ImagePlot - a set of free open-source software tools which we developed in the lab and used in all our own research projects. You can now take a set of images and videos, automatically extract basic visual feature and then explore the patterns in your image or video collection using the extracted data.

So we are very exited to read the latest post from founder of Mathematica and Wolfram|Alpha Stephen Wolfram about the this week release of Wolfram|Alpha Pro, and how it automates analysis and visualization of the data sets.


Last October I was fortunate to chat with Stephen over lunch at Wolfram Data Summmit 2011 conference. He was interested in our work on analyzing and visualizing the spaces of variations of cultural artifacts. We talked about how the models of biological evolution and variability may related to cultural evolution and also the principles described in Wolfram's famous book The New Kind of Science.

Here are a few examples of our visualizations of spaces of variation in different kinds of cultural artifacts:

Mondrian vs Rothko: footprints and evolution in style space

Google logo space

One million manga pages

2012: social analytics for the rest of us

Until recently, large-scale social media analytics and large-scale user testing and was only done by large companies.

In 2012, expect many of these capabilities to become available to individual users - for relatively low rates (which means soon it will be free.)

Google pioneered with this Google Trends, Google Insights for Search, and Google Analytics (www.google.com/analytics) which since 2006 offered free in-depth analytics on an individual web site or blog.

According to one analysis, Google Analytics is now used on %50 of top 1,000,000 web sites. (Google Analytics Market Share. 2010-08-21.)

Google Trends and Insights were unique in allowing all of to see certain kinds trends as expressed by massive amounts of social data. In contrast, most other current offerings only allow you to analyze trends related to your own social product (web site, blog, Facebook page, YouTube video) performance.

For an example, see Facebook insights which "provides Facebook Page owners and Facebook Platform developers with metrics around their content."

And here is an example of user testing analytics - again, for your design:

Solidify (solidifyapp.com) is offering a few applications for web designers Verify is "the fastest way to collect and analyze user feedback on screens or mockups. See where people click, what they remember, or how they feel." Solidify is "the quickest way to prototype interface screens for user testing feedback. Learn where people get confused by page interactions."

This trend is not going to change overnight. However, more tools are coming and at least some of them promise free or almost free social analytics based on everybody's data:

Twitter "will unveil a series of new tools in the next few months, including sophisticated analytical tools, according to Erica Anderson, Twitter's manager for news and journalism." "Going forward, Anderson expects to see more people using Twitter to predict behaviors by analyzing wide swaths of tweets. She did not make it clear if the new analytical tools will include some features aimed at spotting trends. "The predictive nature of Twitter is still largely untapped," she said." (source: ReadWriteWeb, January 29, 2012.)

Socialflow (socialflow.com)is "Optimized Publisher for Twitter and Facebook. Publish items at the precise moments when they will maximize clicks, Retweets, mentions and follower growth. Use the SocialFlow AttentionScore™ to derive more value from your content."

Also, check out this experts discussion business-oriented trends in social analytics offerings.

Workshop on Social Media Visualization (June 4, 2012, Dublin)

Call for Papers

Workshop on Social Media Visualization (SocMedVis)
at the International AAAI Conference on Weblogs and Social Media
4th June 2012, Dublin, Ireland


Social media study and analysis brings researchers from many fields into
a single setting. Even though the tasks of these researchers are varied,
data visualization and analytics plays an important role. For industry
and academics alike, visualization of social media data helps with
hypothesis formation and supports the explanation of phenomena.

The Workshop on Social Media Visualization (SocMedVis) will be held in
conjunction with the International AAAI Conference on Weblogs and Social
Media (ICWSM'12) in Dublin on 4th June 2012. The
workshop is a venue for presentation of research and applications of
visualization of social media data.

The goal of the workshop is to bring together researchers and industry
practitioners interested in visual and interactive techniques for social
media analysis to discuss their potential application to the social
sciences, humanities, and industry. We solicit contributions that
discuss novel techniques and applications of visualization and visual
analytics approaches to social media data sources.

Important Dates

* Paper Submission Deadline: 2nd March 2012
* Notification: 16th March 2012
* Paper Camera-Ready Deadline: 2nd April 2012
* Early registration for ICWSM'12: 6th April 2012
* Workshop: 4th June 2012


We invite research, application, and position papers on the topic of
visualization and social media. Topics of interest include but are not
limited to:

* Visual Analysis of Evolving Social Media Data
* Interactive Techniques for Sentiment Analysis and Brand Perception
* Visualization of Memes and Trends in Social Media
* Visualization of Social Media for Media Studies
* Social Media in Social Sciences and Humanities
* Data Mining and Machine Learning in Social Media
* Visual Analytics of Social Media in Industry
* Formal Evaluation Techniques in Social Media
* Systems and Languages for Social Media Analytics
* Methodologies and Processes for Social Media Analysis
* Collaborative Analysis of Text Corpora
* Real-Time Visualization of Social Media Data
* Visualization and Visual Text Analytics in Social Media
* Visualization and Visual Analytics of Social Media Networks
* Studies of Analytic Work on Social Media
* Representations of Uncertainty in Text Analytics


Papers are limited to 4 pages in length and should have a visualization
or visual analytics component to them. Papers must be formatted
according to AAAI guidelines
Papers in PDF format must be submitted using EasyChair
by 2nd March
2012. A subset of the accepted papers will be invited for oral
presentation at the workshop. All other accepted papers will be
presented as posters during the poster session or interactive demo session.

The program will also include a madness session at the beginning of the
workshop to allow anyone attending the workshop to briefly introduce
themselves, their work, and state their positions and goals related to
the scope of the workshop. If you are planning to attend the workshop
you are welcome to send an email to conference organizers to ensure a
slot in the madness session by 2nd April 2012.

We also plan to host a panel of researchers in social sciences and
humanities and industrial researchers of varied perspectives on the
application of visualization to academic disciplines and industry where
visualization is needed to understand social media data.

For more information please visit the workshop web page at
http://socmedvis.ucd.ie or email

Workshop Social Media Visualization

Workshop Social Media Visualization (04 de junho de 2012, Dublin)

Call for Papers

Workshop sobre Social Media Visualization (SocMedVis) @ Conferência Internacional sobre Weblogs AAAI e Mídias Sociais (ICWSM'12).

04 de junho de 2012, Dublin, Irlanda


O estudo e a análise da mídia social tem aproximado pesquisadores dos mais variados campos. Mesmo que as tarefas desses pesquisadores sejam diversas, a visualização de dados e a sua análise tem desempenhado um papel importante na pesquisa. Para a indústria e academia a visualização dos dados das mídias sociais contribuem na formação de hipóteses e na explicação de determinados fenômenos.

O Workshop de Visualização de Mídia Social (SocMedVis) será realizado em conjunto com a Conferência Internacional sobre Weblogs AAAI e Social Media (ICWSM'12) em Dublin em 04 de junho de 2012. A oficina é um espaço para apresentação de pesquisas e aplicações de visualização de dados das mídias sociais.

O objetivo do workshop é reunir pesquisadores e profissionais atuantes no mercado que estejam interessados ​​em técnicas visuais e interativas para a transformação social e análise da mídia para discutir o seu potencial de aplicação no campo das ciências sociais, humanidades e indústria.

Datas Importantes

* Prazo para Submissão: 02 março de 2012
* Notificação: 16 mar, 2012
* Prazo do paper: 2 de abril de 2012
* Inscrições antecipadas para ICWSM'12: 06 de abril de 2012
* Workshop: 04 de junho de 2012


Tópicos de interesse incluem, mas não são
limitados a:

* Análise Visual de Dados e Evolução Social da Mídia
* Técnicas interativas para análise de sentimento e percepção da marca
* Visualização de Memes e Tendências em Mídias Sociais
* Visualização de Mídia Social para Estudos da Mídia
* As Mídias Sociais em Ciências Sociais e Humanas
* Data Mining e Machine Learning em Mídias Sociais
* As Visual Analytics de Mídias Sociais na Indústria
* Os Formais Técnicas de Avaliação em Mídias Sociais
* Sistemas e linguagens para Analytics Social Media
* Os Metodologias e Processos para Análise de Mídias Sociais
* Análise de Texto Colaborativo Corporativo
* Visualização em tempo real de dados de Mídias Sociais
* Visualização e Análise de Texto Visual em Mídias Sociais
* Visualização e Análise Visual de redes de mídia social
* Estudos de trabalho analítico sobre Mídias Sociais
* Os Representações e Incerteza em Text Analytics


Os papers são limitados a 4 páginas e devem ter uma visualização ou algum componente de análise visual. Os trabalhos devem ser formatados acordo com as diretrizes AAAI.
Documentos em formato PDF devem ser apresentadas utilizando EasyChair até o dia 2 de março de 2012. Um subconjunto dos trabalhos aceitos serão convidados para apresentação oral. Todos os outros trabalhos aceitos serão apresentados como posters durante a sessão de demonstração interativa.

O seminário planeja receber um grupo de pesquisadores em ciências sociais e ciências humanas, além de pesquisadores do campo da indústria para debater sobre a aplicação de visualizações em disciplinas acadêmicas e na própria indústria, onde a visualização é necessária para compreender os dados advindos das mídias sociais.

Para mais informações, visite a página do workshop em http://socmedvis.ucd.ie ou envie um e-mail para socmedvis@ucd.ie.

The Lived Logics of Database Machinery

The Lived Logics of Database Machinery
A one-day workshop organised by Computational Culture (http://www.computationalculture.net/)

Date: Thursday 28th June
Location: Central London

With many of the most significant changes in the organisation and distribution of knowledge, practices of ordering, forms of communication and modes of governance taking shape around it, the database has remained surprisingly recalcitrant to anything other than technical forms of analysis. Its ostensibly neutral status as a technology has
allowed it to play a significant - yet largely overlooked - role in modelling of populations and configuring practices, from organisational labour through knowledge production to art.

The importance of the database for gathering and analysing information has been a theme of many studies (especially those relating to surveillance) but the specific agency of the database as an active mediator in its own right, as an actor in constructing, organising and modifying social relations is less well understood.

A one-day workshop, organized by Computational Culture seeks to rectify this state of affairs. We are looking for proposals for papers, interventions, poster presentations and critical accounts of practical projects that address the theme of the social, cultural and political logics of database technologies.

Proposals should aim to address the intersection of the technical qualities of databases and their management systems with social or cultural relations and the critical questions these raise. We are particularly interested in work that addresses the ways in which entity-relations models, or structures of data-atomisation, become active logics in the construction of the world. Historical contributions that tease out the connections between the database 'condition' and antecedent technical and theoretical objects (from indexes and archives to set theory), or which develop critical accounts of transparency are also particularly welcome.

The focus of the workshop on the lived social dynamics and political logics of database technologies is envisaged as a means of opening up paths of enquiry and addressing questions that typically get lost between the ‘social’ and the ‘technological’:

How do the ordering of views, permissions structures, the normalisation
of data, and other characteristic forms of databases contribute to the
generation of forms subjectivity and of culture?

What impact does the need to manage terabytes of data have on knowledge
production, and how can the normative assumptions embedded in uses of data
and database technologies be challenged or counter-effectuated?

What conceptual frameworks do we need to get a hold on the operational
logics of the database and the immanence of social categorization to
relational algebras?

Is there a workable politics available for exploring strategies of data management, the commonalities and differences of practices in different settings - from genome sequence archiving through supply chain management to medical records and cultural history?

Abstracts of around 500 words should be sent to editorial@computationalculture.net by March 9th

Google logo space

Google Logo Space
(Click on the image to see full 9000 x 6750 version on Flickr.)

Jeremy Douglass, 2009.
Visualization shows 587 versions of the original Google logo, which appeared on google.com pages between 1998 and summer 2009. Some versions, which celebrate important events and people, appeared worldwide; others were only used on google.com home page in particular countries.

Each logo version was automatically analyzed using digital image processing software to extract a number of visual features. The visualization uses these features to situate all logos in 2D space in such a way that their positions indicate how different each logo is from the original.

 “Difference” can be defined in many ways. Each definition would result in a different visualization. Our visualization shows only two of these possibilities. Horizontal placement (X axis) indicates the amount of logo’s modification from the original. The least modified logos are on the extreme left; the logos with most modifications are on the extreme right. Vertical placement (Y axis) indicates which part of a logo was modified. Logos where most of the modifications are in the upper part appear in upper part of the visualization; logos where most modifications are in the lower part appear in lower part of the visualization.

Close-up: small variations (the right part of the visualization).

Close-up: medium variations (the center part of the visualization).

Close-up: extreme variations (the left part of the visualization).

Every day billions of people see a new logo appear on Google’s homepage. Since 1998 these logo variations have explored an ever-growing range of design possibilities while still retaining the “essence” of the original logo. Our visualization of 587 logos shows the space of these variations. Only a few logos have very small or very large modifications. These logos appear around the edges of the “cloud” in the center that consists from logos with moderate changes. The overall “shape” of this space of Google logo design variations is similar to the well-known normal (Gaussian) distribution which describes variability of all kinds of data encountered in natural and social sciences. However, if we are to plot the logos over time (not shown in this visualization), we will find a different pattern - the amount of logos’ modifications from the original design has been increasing significantly over last few years.

This visualization version uses transparency which allows to better see the densities (number of logos) in different parts of the design space.

Certain types of subject matter result in similar design solutions, which further structures this space of the design variations. For example, specific national observances often feature top-heavy additions of flags, fireworks, or crowns that cluster towards the top of the cloud, along with a set of logos featuring athletes in the air.

A cluster of designs with national flags appearing in the top part of the visualization.

Designs with characters appearing in the bottom part of the visualization.


Lev Manovich lecture at USC, February 13, 2012

where: University of Southern California (USC)

date: February 13, 2012

time: 4pm

building: Davidson Conference Center

RSVP at www.usc.edu/esvp (code: Manovich)

speaker: Lev Manovich



Many commentators recently pointed out that the joint availability of massive amounts of data together with the computational tools for their analysis is having transformative effects on many fields. For example, a special report “Data, Data Everywhere” in Economist (February 2010) noted that big data’s “effect is being felt everywhere, from business to science, from government to the arts.” An article in New York Times (November 16, 2010) stated: “The next big idea in language, history and the arts? Data.” In 2009 and 2011, National Endowment of Humanities together with National Science Foundation organized Digging Into Data competitions to “address how ‘big data’ changes the research landscape for the humanities and social sciences.” And on February 12, New York Times featured yet another report The Age of Big Data.

In 2007 I created Software Studies Initiative at UCSD (www.softwarestudies.com) to both explore theoretical consequences of using computational methods for the study of culture, and to develop techniques and software tools which will enable humanists and social scientists work with massive visual data sets. We also aimed to create a research agenda and research outputs that would be relevant to people in many disciplines – from arts and humanities to computer science and engineering. This attempt has been successful. We received funding from both NEH and NSF, published our work in humanities, social science and computer sciences venues, shown our visualizations in a number of important international art and design exhibitions, released free open source software tools to enable other researchers and students to use our techniques in their own research, and worked with Library of Congress, Getty Research Institute, and other institutions to analyze parts of their digital collections.

Using our experiences as a starting point, in my talk I will discuss how “big data” paradigm creates new opportunities (as well as new challenges) for collaboration between disciplines; point out examples of what I see as fundamentally new research methodologies which computation brings to the study of the society and culture; and show examples of our projects including visualizations of one million pages of manga (Japanese comics), all issues of Science and Popular Science from 1872-1922 period, and other cultural data sets.

Kingdom Hearts game play visualizations

William Huber, 2010

This project represents nearly 100 hours of playing two videogames as high resolution visualizations. This representation allows us to study the interplay of various elements of gameplay, and the relationship between the travel through game spaces and the passage of time in game play.

Kingdom Hearts videogame traversal
Kingdom Hearts videogame traversal | 29 sessions over 20 days, 62.5 hours total | visualization uses 22,500 frames

Kingdom Hearts II videogame traversal
Kingdom Hearts videogame traversal | 16 sessions over 18 days, 27 hours total | visualization uses 13,300 frames

Visualizations details:
Data: The data are the game play sessions of the video games Kingdom Hearts (2002, Square Co., Ltd.) and Kingdom Hearts II (2005, Square-Enix, Inc.) Each game was played from the beginning to the end over a number of sessions. The video captured from all game sessions of each game were assembled into a singe sequence. The sequences were sampled at 6 frames per second. This resulted in 225,000 frames for Kingdom Hearts and 133,000 frames for Kingdom Hearts II. The visualizations use only every 10th frame from the complete frame sets: Kingdom Hearts: 22,500 frames. Kingdom Hearts II: 13,300 frames.
Timescales: Japanese role-playing games such as Kingdom Hearts can take from about 40 to over 100 hours to complete. Kingdom Hearts game play: 62.5 hours of game play, in 29 sessions over 20 days. Kingdom Hearts II game play: 37 hours of game play, in 16 sessions over 18 days.
Mapping: Frames are organized in a grid in order of game play (left to right, top to bottom).

Kingdom Hearts is a franchise of video games and other media properties created in 2002 via a collaboration between Tokyo-based videogame publisher Square (now Square-Enix) and The Walt Disney Company, in which original characters created by Square travel through worlds representing Disney-owned media properties (Tarzan, Alice in Wonderland, The Nightmare before Christmas, etc.). Each world has its distinct characters derived from the respective Disney-produced films. It also features a distinct color palettes and rendering styles, which are related to visual styles of the corresponding Disney films.

Like other software-based artifacts, video games can have infinite varied realizations since each game play is unique. Compressing many hours of game play into a single high resolution image and placing a number of such visualizations next to each other allows us to see the patterns of similarity and differences between these realizations. These visualizations are also useful in comparing different releases of the popular games – such as the two releases of Kingdom Hearts shown here.

Kingdom Hearts videogame worlds crawl

Visualizations details:
Data: sampled frames from the video capture of the complete traversal (playing the game from the beginning to the end) of the videogame Kingdom Hearts (2002, Square Co., Ltd.)
Timescales: game play length: 62.5 hours, in 29 sessions over 20 days. Animated over 7.8 minutes. From the original video capture done at 1 frame per second, every 16th frame was sampled. In the animation, these sampled frames are played back at 30 frames per second.
Mapping: completing Kingdom Hearts took 29 separate sessions. The sampled frame sequences are positioned from left to right in the order of sessions, and from bottom to top in order of the worlds visited.

This visualization relates together the travel through game spaces and the passage of time in game play. In the course of playing Kingdom Hearts games, a player moves through a number of different worlds. The visualization below represents each world by a single frame. These representative frames are organized vertically in the same order as the player encountered them in the course of the game (first visited world is on the bottom, next visited world is above, and so on.) The game was played over a number of separate sessions over many days; each session occupies a single column in a visualization (left to right.)

This visualization highlights the telescoping mobility over the game-worlds over time: first, the play visits one world at a time. In the middle of the game traversal (sessions 18 and 19) the player freely shuttles between the worlds, flattening them and treating them as resources, rather than as narrative spaces.

By animating this passage through spaces, we can see the experience of expanding mobility between them:

Related projects:

Graphing temporal patterns in gameplay

Ludological dynamics of Fatal Frame 2

Lev Manovich: From Reading to Pattern Recognition

My short recent text "From Reading to Pattern Recognition" has been published on the exellent The Creators Project site.

The original version of the text appeared in I read where I am: Exploring New Information Cultures, ed. Mieke Gerritzen, Geert Lovink, Minke Kampman (Graphic Museum, Breda and Valiz, Amsterdam, 2011).

call for papers for the next issue of Computational Culture jourmal

Call For Papers
Computational Culture, a journal of software studies
Deadline: 30th March 2012

The new peer-reviewed open access journal Computational Culture has
been launched. The first issue entitled ‘A Million Gadget Minds’ is
available online at: http://computationalculture.net/

The journal’s primary aim is to examine the ways in which software
undergirds and formulates contemporary life. Computational processes
and systems not only enable contemporary forms of work and play and
the management of emotional life but also drive the unfolding of new
events that constitute political, social and ontological domains.

In order to understand digital objects - such as corporate software,
search engines, medical databases – and their constitutive role in
culture, or to enquire into the use of mobile phones, social networks,
dating, games, financial systems or political crises, a detailed
analysis of software cannot be avoided. A developing form of literacy
is required that matches an understanding of computational processes
with those traditionally bound within the arts, humanities, and social
sciences but also in more informal or practical modes of knowledge
such as hacking and art.

Computational Culture is now inviting contributions for the second
issue. We seek articles, book, project and software reviews, and also
indications of interest for future special issues focusing on
databases and social media.

Please submit a completed paper by the deadline of 30th of March or
contact the editors via the website to express an interest.

Editorial group,