Random sample of Twitter images from 2013 labeled by GoogLeNet deep learning model as web sites and texts. We call refer to these images as "image-texts." The category includes screen shots of text chats, other types of texts and other kinds of non-photographic images. We found that the frequencies of these images are correlated with well-being responses from Gallup surveys, and also median housing prices, incomes, and education levels.
Johannes Vermeer. Woman in Blue Reading a Letter. 1663-1664.
Our new paper has been accepted for Big Data and the Humanities workshop at IEEE 2015 Big Data Conference:
Humanists use historical images as sources of information about social norms, behavior, fashion, and other details of particular cultures, places and periods. Dutch Golden Era paintings, works by French Impressionists, and 20th century street photography are just three examples of such images. Normally such visuals directly show objects of interests such as social scenes, city streets, or peoples dresses. But what if masses of images shared on social networks contain information about social trends even if these images do not directly represent objects of interest? This is the question we investigate in our study.
In the last few years researchers have shown that aggregated characteristics of large volumes of social media are correlated with many socio-economic characteristics and can also predict a range of social trends. The examples include flu trends, success of movies, and measures of social well-being of populations. Nearly all such studies focus on text content, such as posts on Twitter and Facebook.
In contrast, we focus on images. We investigate if features extracted from Tweeted images can predict a number of socio-economic characteristics. Our dataset is one million images shared on Twitter during one year in 20 different U.S. cities. We classify the content of these images using the state-of-the-art Convolutional Neural Network GoogLeNet and then select the largest category that we call “image-texts” - non-photographic images that are typically screen shots of websites or text-message conversations. We construct two features describing patterns in image-texts: aggregated sharing rate per year per city, and the sharing rate per hour over a 24-hour period aggregated over one year in each city.
We find that these features are correlated with self-reported social well-being responses from Gallup surveys, and also median housing prices, incomes, and education levels. These results suggest that particular types of social media images can be used to predict social characteristics not readily detectable in images.
The full paper will be available online after IEEE 2015 Big Data conference.