Research on Remix and Cultural Analytics, Part 2



Image: detail of video montage grid of Radiohead's Lotus Flower. Larger images of this montage and others with proper explanation are included below.

Post-doctoral research by Eduardo Navas


Keywords: Cultural Analytics, Lotus Flower, Remix, memes, remix culture, Thom Yorke, YouTube, parody

As part of my post doctoral research for The Department of Information Science and Media Studies at the University of Bergen, Norway, I am using cultural analytics techniques to analyze YouTube video remixes. My research is done in collaboration with the Software Studies Lab at the University of California, San Diego. A big thank you to CRCA at Calit2 for providing a space for daily work during my stays in San Diego.


This is part 2 of a series of posts in which I introduce three case studies of YouTube video remixes. My first case study is the Charleston Style remixes.





Radiohead uploaded their original official music video on February 16, 2011. The video consists of Thom Yorke, the band's lead singer, dancing and singing in an empty garage-like space. The footage includes close-ups, mid and long shots of Yorke improvising his dance. When viewing the original video it is evident that Yorke’s quirkiness in part is the reason why the footage was a readymade for a viral meme. The remixes began to appear, just two days after the original was uploaded, on February 18. The range of songs that replaced Radiohead’s original include well known musical classics from Zorba the Greek, pop songs from the Venga Boys, as well top ten hits by Lady Gaga, among others. Below are some of the videos analyzed.




This remix consists of footage taken from the original Radiohead video, which was re-edited to match the song "All the Single Ladies" by Beyonce, uploaded on February 18, 2011.



This video is titled "Thom Yorke Goes Bananas." In this case, the video footage of Lotus Flower was selectively re-edited to match a samba composition. It was uploaded on February 18, 2011.



This video is titled "Thom Yorke Does the Macarena!" In this case, the video footage of Lotus Flower was selectively re-edited to match the Macarena song and video. It was uploaded on February 18, 2011.

Following the method of analysis of my first case study on the Charleston Style, I first looked at the montage of the videos.


View 2000 px wide version.

This is the grid montage of the original video by radiohead.


View 2000 px wide version

This is the video grid montage of "All the Single Ladies"


View 2000 px wide version

This is the video grid montage of "Thom Yorke Goes Bananas!"


View 2000 px wide version

This is the video grid montage of "Thom Yorke Does the Macarena."

When viewing these grids it becomes evident that the remixers, from the very beginning, took the liberty to edit the footage selectively to match particular songs. This is a different approach in contrast with the Charleston remixes, which, for the most part, leave the video footage intact. The exception is the occasional time adjustment to match the beat of a song.


View 2000 px wide version

When slicing the video frames, it becomes clear which video sections are remixed. Compare the slices of the original video (above) with the slices of the three other videos, which follow below.


View 2000 px wide version

These are slices of "All the Single Ladies"


View 2000 px wide version

These are slices of "Thom Yorke Goes Bananas!"


View 2000 px wide version

These are slices of "Thom Yorke Does the Macarena."

The slice visualizations have been adjusted to fit this blog's design. Many of the remixes are much shorter than the original video by Radiohead, this is because the footage is re-edited to match the length of the songs selected. One of the shortest is the Macarena remix, which is just over a minute.

As mentioned before, this is my second case study. After the introduction of my third case study, I will compare the three memes in order to evaluate the patterns of the remixes.

Contra a busca

por Lev Manovich

Tradução: Cicero Inacio da Silva

palavras-chave: busca, Google, descoberta do conhecimento, biblioteca digital, classificação, folksonomia, recuperação da informação, HCI, interface, visualização da informação, humanidades digitais, analítica cultural, analítica visual, estudos do software, Manovich

Pesquisadores das humanidades e dos estudos de mídia do início do século XXI tem acesso a quantidades de mídias sem precedentes - mais do que eles podem possivelmente estudar, simplesmente assistir ou mesmo pesquisar. (Como exemplo de grandes coleções de mídias, veja a lista de repositórios disponível aos participantes da competição Escavando dados 2011 - www.digginintodata.org)

O método básico das humanidades e dos estudos das mídias, que funcionava bem quando o número de objetos midiáticos era pequeno - veja todas as imagens e vídeos, padrões de notícias e os interprete - não mais funciona. Por exemplo, como você estuda 167.000 imagens da galeria Art Now Flickr, 236.000 portfólios profissionais no coroflot.com (ambos números são de Julho de 2011) ou 176.000 fotografias tiradas entre 1935 e 1944 relativas ao Farm Security Administration/Office of War digitalizadas pela biblioteca do Congresso Americano (www.loc.gov/pictures)?

Dado o tamanho das coleções de mídias digitais contemporâneas, simplesmente ver o que está dentro delas é impossível.

Embora possa parecer que as razões para isso sejam as limitações da visão e processamento de informação pelo ser humano, penso que na verdade isso é culpa dos designs de interface e da tecnologia web. Interfaces padronizadas para coleções enormes de mídias tais como lista, galeria, grade (grid) e slide não nos permitem ver os conteúdos de uma coleção inteira. Essas interfaces normalmente só disponibilizam alguns itens por vez (independentemente de você estar em modo de navegação ou procura). Esse método de acesso não nos permite compreender a “forma” de toda uma coleção e observar padrões interessantes.

As tecnologias populares de acesso às mídias dos séculos XIX e XX, tais como os projetores de diapositivos, os projetores de vídeo, os leitores de microfilmes, a moviola e o Steenbek, os discos de vinil, as fitas de áudio e vídeo, o VCR e os DVDs foram desenvolvidos para acessar um único conteúdo de mídia por vez em um número limitado de velocidades.

Esse sistema andou de mãos dadas com os mecanismos de distribuição de mídia: lojas de discos e vídeos, bibliotecas, televisão e rádio apenas disponibilizavam alguns itens de cada vez. Por exemplo, você não poderia assistir mais do que alguns poucos canais de TV ao mesmo tempo, ou pedir mais do que uma algumas poucas fitas de vídeo em uma biblioteca.

Ao mesmo tempo, esses sistemas hierárquicos de classificação utilizados nos catálogos das bibliotecas tornaram difícil procurar em uma coleção ou mesmo navegar por ordenamentos não suportados pela catalogação. Quando você caminhava de estante em estante, você seguia tipicamente uma classificação baseada em tópicos, com livros organizados por nome de autor dentro de cada categoria.

Juntos, esses sistemas de distribuição e classificação encorajaram os pesquisadores das mídias do século XX a decidir por antecipação que itens de mídia ver, ouvir ou ler. Um pesquisador normalmente começava com um tema em mente - filmes de um autor particular, trabalhos de um determinado fotógrafo ou categorias como “Filmes experimentais dos anos 1950” e “Cartões postais parisienses do início do século XX”. Era impossível imaginar navegar através de todos os filmes feitos ou em todos os cartões postais impressos. (Um dos primeiros projetos de mídia que organizam sua narrativa em torno da navegação em um arquivo de mídia é o filme História(s) do Cinema de Jean-Luc Godard, que retira amostras de centenas de filmes). O método popular das ciências sociais para trabalhar com grandes conjuntos de mídia de forma objectiva - análise de conteúdo, ou seja, os “marcadores” de semântica em uma coleção de mídia criados por inúmeras pessoas utilizando um vocabulário predefinido de termos também requer que o pesquisador decida antecipadamente que informação seria interessante marcar.

Infelizmente, o padrão atual de acesso às mídias - busca no computador - não nos tira desse paradigma. A interface da busca é um quadro em branco esperando você digitar algo. Antes de clicar no botão de pesquisar você tem que decidir que palavras-chave ou frases quer procurar. Dessa forma, enquanto a busca traz um aumento dramático na velocidade de acesso, ela assume que você saiba de antemão que algo deve valer a pena na coleção que você vai explorar.

Precisamos de técnicas para uma navegação eficiente no conteúdo e para a descoberta de padrões nas coleções maciças de mídia. Veja essa definição de “browse” (navegar): “Para examinar (sondar), para casualmente olhar através a fim de encontrar itens de interesse, especialmente sem ter conhecimento antecipado do que procurar" (Browse no Wikitionary).

Como podemos descobrir coisas interessantes em coleções massivas? Ou seja, como podemos examiná-las de forma eficiente e eficaz, sem o conhecimento do que queremos encontrar?

------------------
Anja Wiesinger escreveu uma resposta a este post:

http://neuneun.com/2011/07/in-search-of/
------------------
Algumas notas sobre a história dos mecanismos de busca e interfaces de coleções de mídia - para artigo

http://en.wikipedia.org/wiki/Microfilm "Utilizando o processo do Daguerreótipo, John Benjamin Dancer foi o primeiro a produzir microfotografias em 1839. Ele conseguiu um raio de redução de 160:1".

“Em 1896 o engenheiro canadense Reginald A. Fedessen sugeriu que as microformas eram uma solução compacta para materiais pesados freqüentemente consultados. Ele propôs que mais de 150.000.000 de palavras poderiam caber em uma polegada quadrada e que um pé cúbico poderia conter 1.5 milhões de volumes”

“O ano 1938 também viu outro grande evento na história do microfilme quando a University Microfilms International (UMI) foi estabelecida por Eugene Power.

(http://en.wikipedia.org/wiki/Emanuel_Goldberg):
Emanuel Goldberg "apresenta a sua " Máquina de Estatística", um motor de busca de documentos que usou células fotoelétricas e reconhecimento de padrões para pesquisar os metadados em rolos de documentos microfilmados (EUA, patente 1.838.389, 29 de dezembro de 1931). Essa tecnologia foi utilizada de forma variante em 1938 por Vannevar Bush em seu "seletor rápido de microfilmes", em sua "comparação" (por criptoanálise) e foi a base tecnológica para o Memex imaginário descrita no seu influente ensaio “As we may think” (Como podemos pensar) de 1945.

Recuperação da informação: http://en.wikipedia.org/wiki/Information_retreival#Timeline

"1950: o termo "recuperação da informação" parece ter sido cunhado por Calvin Mooers."

There is only software text @ FILE 2011 Catalog (Brazil)

FILE 2011 Catalog

Against Search

Lev Manovich, July 21, 2011

keywords: search, Google, knowledge discovery, digital library, database, classification, folksonomy, information retrieval, HCI, interface, information visualization, digital humanities, cultural analytics, visual analytics, software studies, Manovich


Early 21st century humanities and media studies researchers have access to unprecedented amounts of media – more than they can possibly study, let alone simply watch or even search. (For examples of large media collections, see the list of repositories made available to the participants of Digging Into Data 2011 Competition, www.diggingintodata.org).

The basic method of humanities and media studies which worked fine when the number of media objects were small – see all images or video, notice patterns, and interpret them – no longer works. For example, how do you study 167,00 images on Art Now Flickr gallery, 236,000 professional design portfolios on coroflot.com (both numbers as of 7/2011), or 176,000 Farm Security Administration/Office of War Information photographs taken between 1935 and 1944 digitized by Library of Congress (http://www.loc.gov/pictures/)?

Given the size of typical contemporary digital media collections, simply seeing what’s inside them is impossible.

Although it may appear that the reasons for this are the limitations of human vision and human information processing, I think that it is actually the fault of current interface designs and web technology. Standard interfaces for massive digital media collections such as list, gallery, grid, and slide do now allow us to see the contents of a whole collection. These interfaces usually they only display a few items at a time (regardless of whether you are in a browing mode, or in a search mode). This access method does not allow us to understand the “shape” of overall collection and notice interesting patters.

The popular media access technologies of the 19th and 20th century such as slide lanterns, film projectors, microfilm readers, Moviola and Steenbeck, record players, audio and video tape recorders, VCR, and DVD players were designed to access single media items at a time at a limited range of speeds. This went hand in hand with the media distribution mechanisms: record and video stores, libraries, television and radio would all only make available a few items at a time. For instance, you could not watch more than a few TV channels at the same time, or borrow more than a few videotapes from a library.

At the same time, hierarchical classification systems used in library catalogs made it difficult to browse a collection or navigate it in orders not supported by catalogs. When you walked from shelf to shelf, you were typically following a classiffication based on subjects, with books organized by author names inside each category.

Together, these distribution and classification systems encouraged 20th century media researchers to decide before hand what media items to see, hear, or read. A researcher usually started with some subject in mind – films by a particular author, works by a particular photographer, or categories such as “1950s experimental American films” and “early 20th century Paris postcards.” It was impossible to imagine navigating through all films ever made or all postcards ever printed. (One of the the first media projects which organizes its narrative around navigation of a media archive is Jean-Luck Godard’s "Histoire(s) du cinéma" which draws samples from hundreds of films. ) The popular social science method for working with larger media sets in an objective manner – content analysis, i.e. tagging of semantics in a media collection by several people using a predefined vocabulary of terms also requires that a researcher decide before hand what information would be relevant to tag.

Unfortunately, the current standard in media access – computer search – does not take us out of this paradigm. Search interface is a blank frame waiting for you to type something. Before you click on search button, you have to decide what keywords and phrases to search for. So while the search brings a dramatic increase in speed of access, it assumes is that you know beforehand something about the collection worth exploring further.

We need the techniques for efficient browsing of content and discovery of patterns in massive media collections. Consider this defintion of “browse”: “To scan, to casually look through in order to find items of interest, especially without knowledge of what to look for beforehand” (“Browse”, Wiktionary). Consider also one of the meanings of the word “exploration”: “to travel somewhere in search of discovery” (“Exploration”, Wiktionary.) How can we discover interesting things in massive media collections? I.e., how can we browse through them efficiently and effectively, without a knowledge of what we want to find?



---------------------
Anja Wiesinger wrote an interesting response to this post:

http://neuneun.com/2011/07/in-search-of/

---------------------
Some notes on the history of search engines and media collection interfaces - for article

http://en.wikipedia.org/wiki/Microfilm "Using the daguerreotype process, John Benjamin Dancer was one of the first to produce micro-photographs, in 1839. He achieved a reduction ratio of 160:1".

"In 1896, Canadian engineer Reginald A. Fessenden suggested microforms were a compact solution to engineers' unwieldy but frequently consulted materials. He proposed that up to 150,000,000 words could be made to fit in a square inch, and that a one foot cube could contain 1.5 million volumes"

"The year 1938 also saw another major event in the history of microfilm when University Microfilms International (UMI) was established by Eugene Power."


(http://en.wikipedia.org/wiki/Emanuel_Goldberg):
Emanuel Goldberg "introduced his “Statistical Machine,” a document search engine that used photoelectric cells and pattern recognition to search the metadata on rolls of microfilmed documents (US patent 1,838,389, 29 December 1931). This technology was used in a variant form in 1938 by Vannevar Bush in his “microfilm rapid selector,” his “comparator” (for cryptanalysis), and was the technological basis for the imaginary Memex in Bush’s influential 1945 essay “As we may think.”

Information retrieval
http://en.wikipedia.org/wiki/Information_retreival#Timeline

"1950: The term "information retrieval" appears to have been coined by Calvin Mooers."

PDF of Manovich's The Language of New Media 2001 book manuscript available on academia.edu

I found that academia.edu has many of my papers, and also PDF of my 2001 book manuscript (before it was copy edited):

http://ucsd.academia.edu/LevManovich/Papers

how do you a call a person who is interacting with digital media ?



New media theory and software studies needs basic terms.

For example, how do we call a person who is interacting with digital media?
User? No good.. "interactor"?

In 20rh century cultural theory dealt with viewers, readers, listeners, participants. In 21st century we don't know who to call people we study. (game studies have "players" so at least they have this figured out


Thinking more about it, I realized that we can't have a single good term to describe what we do with digital media for a reason.

In the 1960s-1970s digital media pioneers like Alan Kay systematically simulated most existing mediums in a computer. Computers, and various computing devices which followed (such as "smart" phones)came to support reading, viewing, participating, playing, remixing, collaborating.. and also many new functions.

This is why 20th century term s- reader, viewer, participant, publisher, player, user - all apply.

This multiplicity of media experiences is one of the defining characteristics of digital media - or, as Alan Kay called it, "the computer metamedium."

human memory record: what, how, size?

Dates in the history of modern social "media memory":

1837: photo memory

1876: audio and moving images memory

1996: web sites memory (archive.org - currently contains 150 billion pages)

2004: web search memory (Google trends database)

We also have now spatial locations and spatial trajectories memories (for both humans and cars) but they have not been exposed because of privacy concerns.

Interestingly, in this history science is relatively behind: dense sensor nets which capture data about climate etc. are rather recent. Of, course less dense measurements have been kept longer - for example, first "reasonably reliable instrumental records of near-surface temperature exist with quasi-global coverage" start in 1850s, i.e. around the same time as "photographic measurements" of human society.

Of course, these technological social records are very uneven in relation to time, location, type of media, and what has been captured. How "much" human life has been captured since 1840s? What percentage of the existing recordings around the world has been digitized? Interesting questions...

P.S. Note that I did not include text records in the timeline - books, birth registers, diaries, sales catalogs, etc - because humans used variety of technologies in different periods and places, so its hard to compress into a few dates.

how to include interactive / very high res visualizations in publications ?

This is an interesting problem for "digital humanities publishing"


for example, I need to include these images:

http://www.flickr.com/photos/culturevis/5109394222/in/set-72157624959121129
http://www.flickr.com/photos/culturevis/5083388410/in/set-72157624959121129

have been struggling with this while formatting my new article
eventually switched page size to 18 inches by 9 inches - since this is for posting online, the page size can be anything
this helps but still does not work perfectly

maybe I will try making 300,000x300,000 pixels Photoshop image which has all high-res visualizations and also article text - but will you be able to open it ?

of course we are also including links to high res versions of image on Flickr - but when you have to go outside to a different site and deal with its own interface


Possible solutions:

1) need to embed interactive media viewers into a publication - but I dont think you can do with common platforms such as Word, Google Doc or PDF. So you have to go to HTML and include some custom app in Flash (and you have to write such app)

2) does new publishing software for iPad and similar mobile platforms includes tool to include zoom/pad of high res images and movies?

3) ideally, we will publish on the same medium which we use for these visualizations (287 megapixel HIPerSpace):

http://www.flickr.com/photos/culturevis/5866777772/in/set-72157625407666408


better ideas?

arriving in 4-5 years: visual data analysis

I just came across the new term I have not encountered before - "visual data analysis." It describes perfectly what we are doing in our lab. In other words, cultural analytics can be also understood as an application of visual data analysis to humanities.


"Visual data analysis blends highly advanced computational methods with sophisticated graphics engines to tap the extraordinary ability of humans to see patterns and structure in even the most complex visual presentations. Currently applied to massive, heterogeneous, and dynamic datasets, such as those generated in studies of astrophysical, fluidic, biological, and other complex processes, the techniques have become sophisticated enough to allow the interactive manipulation of variables in real time. Ultra high-resolution displays allow teams of researchers to zoom in to examine specific aspects of the renderings, or to navigate along interesting visual pathways, following their intuitions and even hunches to see where they may lead. New research is now beginning to apply these sorts of tools to the social sciences and humanities as well, and the techniques offer considerable promise in helping us understand complex social processes like learning, political and organizational change, and the diffusion of knowledge."

(source: http://wp.nmc.org/horizon2010/chapters/visual-data-analysis/)

Cultural Software (new article by Lev Manovich)

Cultural Software (PDF).

This text is a pretty substantial update of the part of the introduction for Software Takes Command manuscript (2008).

Many things happen in software culture between 2008-2011 - for instance, the rize of mobile platform and the "long tail" of apps for these platforms. Dozens of books have been written about Google and other web giants.
The MIT Press published a few titles in our Software Studies series established in 2008.
New software studies journal prepares its first issuie.

I have updated the article to include these developments, and also reworked a number of theoretical points and terms to reflect the state of software culture today. The article also offers a possible taxonomy of types of "cultural software" - open for discussion, of course.






Here is scan of 1993 article about first version of the application later renamed After Effects (InfoWorld magazine) - an example of cultural software discussed in Software Takes Command:

Infoworld.1993.After_Effects.article.scaled.jpg

How many cultural artifacts have been digitized by 2011?

PRINT:

The following is an edited section from http://en.wikipedia.org/wiki/Google_books, accessed July 8, 2011:

12 million books: Google (July 2010).

2.8 million books: Internet Archive is a non-profit which digitizes over 1000 books a day, as well as mirror bookss books from Google Books and other sources. As of May 2011, it hosted over 2.8 million public domain books, greater than the approximate 1 million public domain books at Google Books.

6 million digital objects: HathiTrust maintains HathiTrust Digital Library since 13 October 2008,[17] which preserves and provides access to material scanned by Google, some of the Internet Archive books, and some scanned locally by partner institutions. As of May 2010, it includes about 6 million volumes, over 1 million of which are public domain.

10 million digital objects: Europeana links to roughly 10 million digital objects as of 2010, including video, photos, paintings, audio, maps, manuscripts, printed books, and newspapers from the past 2,000 years of European history from over 1,000 archives in the European Union.

800,000 digital objects: Gallica from the French National Library links to about 800,000 digitized books, newspapers, manuscripts, maps and drawings, etc. Created in 1997, the digital library continues to expand at a rate of about 5000 new documents per month. Since the end of 2008, most of the new scanned documents are available in image and text formats. Most of these documents are written in French.


ART:

1+ million art images: Artstor: "is a non-profit organization that builds and distributes the Digital Library, an online resource of more than one million images in the arts, architecture, humanities, and sciences. The ARTstor Digital Library also includes a set of software tools to view, present, and manage images for research and teaching purposes. There are currently more than 1,300 ARTstor institutional subscribers in over 42 countries" (http://en.wikipedia.org/wiki/ARTstor).

"Information Visualization as a Research Method in Art History" panel at CAA 2012 (Los Angeles)

Together with Christian Huemer (Getty Research Institute), we organized a panel "Visualization as Art Historical Method" for College Art Association 2012 conference (Los Angeles, February 22-25).

List of speakers:

1. Maximilian Schich (Barabási Lab, Boston): Visualizing the Ecology of Complex Networks in Art History
2. Sophie Raux (Université Lille 3): Interactive Mapping of the Agents of the Art Market in Europe (1550-1800)
3. Victoria Szabo (Duke University, Durham), Visualizing Art, Law, and Markets
4. Catherine Dossin (Purdue University): Geoinformatics and Art History. Visualizing the Reception of American Art in Western Europe, 1948-1968.
5. Georgia Gene Berryhill (University of Maryland): Lithics Visualization Project (with Tom Levy and Lev Manovich)
6. Piotr Adamczyk (Google Art Project): Visualizing Museum Collections

UCSD claims claim world data sorting record (1 TB in 106 sec)

At Software Studies Initiative, we imagine the future cultural researchers able to collect analyze and visualize in real time all of social media content and communication - to see cultural patterns of all humanities at all scales.

We are lucky to work with the highest resolution system designed at Calit2: 287 megapixel HIPerSpace.

It is also good to know that we can get help in processing "big cultural data":


UCSD scientists claim world data sorting record for 2nd year - sorting a terabyte (one trillion bytes) of data in 106 seconds

Open access journals and publishers for new media, game studies, software studies, and related fields

Our friends at Computational Culture, a journal of software studies - a brand new an online open-access peer-reviewed journal for software studies work - made this very useful list of other open access journals and publishers for new media studies and related fields:


Follow this link to the list with links:

http://computationalculture.net/?p=165


And here are the names on the list:

C-Theory
Culture Machine
Digital Culture and Education
Digital Studies
Eludamos
First Monday
Fibreculture Journal
Games Studies
Hyperrhiz
Inflexions
International Journal of Communication
Journal of Digital Information
Noema
Open Humanities Alliance
Open Humanities Press
Public Library of Science
Transformations
Vague Terrain

Recently...