Visualizing High Dimensional Image Clusters in 2D: The Growing Entourage Plot (Part I)

Damon Crockett

Activities Crop
A crop from a Growing Entourage plot using Instagram images machine-tagged under the heading 'activities'.

Activities Close
Closeup of above plot showing meeting of two clusters. Empty grid squares are centroid locations.


Since 2007, our lab has been visualizing large collections of cultural images. These visualizations have used either metadata variables such as date and location, or basic image features such as hue, saturation, brightness, number of lines, and texture. In particular, sorting by hue, saturation and brightness turned out to be very useful for quick exploration of large image collections. More recently, however, we've expanded the scope of our analysis to include presence and characteristics of faces (e.g., Selfie City) and now a wide range of object and scene contents. For example, we've just had a paper accepted for the IEEE 2015 Big Data Conference where we used deep learning image classification to analyze the contents of one million Twitter images.

Being able to use the latest computer vision techniques for the analysis of image content is very exciting, but it also brings new challenges. For example, how can we effectively visualize and explore the results of machine classification into many object categories? In this post, I'd like to discuss one particular method we've developed. This method visualizes high-dimensional image clusters using two dimensions. I call this method 'The Growing Entourage Plot'.

The plots in this post are drawn from our current collaboration with Miriam Redi on the clustering and visualization of large collections of Instagram images. Miriam extracted over 1000 image features that include image content (objects and scenes), photo style composition, style, texture, color and other characteristics. She then computed clusters of images using subsets of these features. Big thanks to the object detection team at Flickr for the object and scene tags (their work is described in this blog post).


In the field of information visualization, a great deal of energy is spent on the problem of how to present high-dimensional data on 2D canvases. There are at least three broad categories of solution: (1) preserve all features; (2) preserve some by selection; and (3) preserve some by redefinition. I'll discuss each of these in turn.

The first way of solution is simply to try visualizing everything. Such visualizations can be difficult both to design and to read. Media visualizations - those visualizations whose primary plot elements are visual media, like images - are officially of this type, but additional choices about sorting can make for big differences in readability. The effect of sorting on media visualization is so important, in fact, that any feature not used for sorting is essentially invisible to the viewer.

The second way of solution is the one we've used most often: select some subset of features and use them for sorting. We might, for example, sort our (infinitely?) high-dimensional image data by only brightness and hue. This is powerful and useful, but it does make invisible very complex sorts of similarity relations between images.

The third way of solution defines new features that are typically linear combinations of existing features. Principal Components Analysis (PCA) is the standard here, although there are others (e.g., t-SNE). We can present images in, say, a 1000-feature hyperspace by projecting them to two dimensions, dimensions which hadn't previously existed (although, since they are typically linear combinations of existing dimensions, they are not new data). This approach has the advantage that similarity relations between images in 1000D feature space are preserved as best as possible during the projection to 2D, meaning that our visualizations can reveal very complex sorts of similarity between images.

This third way of solution is becoming quite popular and has been in use in our lab for at least 2 years now (see, e.g., this Flickr album). In an upcoming post, I'll talk about some methods of projection visualization.


But I'd like here to talk about a different approach to dealing with dimensionality, one that is quite common in data analysis but whose use in information visualization is less common: clustering. Dimensionality reduction algorithms like PCA and t-SNE are powerful and useful but suffer major data loss in most cases. You simply can't preserve all the complexity of 1000D relations after projecting to 2D. Clustering algorithms, however (e.g., k-means), preserve a greater share of the relational data, because they find groups of data points in your original feature space (or whichever subset you choose).

Now of course, there is still the problem of visualizing these clusters. This is particularly difficult for traditional sorts of statistical visualization, because their plot elements carry information only by their spatial positions, and the human visual system can parse a maximum of 3 spatial dimensions. Thus, if we want to see these clusters in their 'natural habitat', so to speak, we're probably out of luck. Additionally, seeing clusters of points, even in a 2D or 3D space, is not particularly illuminating, since clusters, unlike classifier outputs, are not 'classes' at all and have no conventional meaning or significance from the outset. We derive the meaning or significance of a cluster of points in a high-dimensional space from the feature values of those points, and in order to see those features in a plot, we'll have again to confront the problem of presenting high-dimensional data in 2D (or 3D). For these reasons, you don't see many cluster plots (what you'll see sometimes is PCA scatterplots with cluster memberships coded by color).

Media visualizations are at an advantage here. Because media visualizations use images as plot elements, a simple presentation of each cluster is actually quite illuminating. Imagine simply presenting clusters of points, side by side. We learn nothing; these clusters are perfectly meaningless. But if those points are images, we can get a sense of what each cluster means. So, media visualizations are perfect for the presentation of clusters of images.


The question now is how exactly to arrange these clusters on a 2D canvas. As I've said before, a simple presentation of each cluster is helpful. We could, for example, make square montages of each cluster and just leaf through them. But we might want more than this - we might want also to see the relations among the clusters. And now we confront a familiar problem: we have a set of data points in n-D, where n > 2, and we want to see them in 2D. The 'points' are now clusters, but the shape of the problem is exactly the same as before. The Growing Entourage plot is my solution to this problem. It projects cluster centroids to 2D and builds clusters around them by turn-taking and semantic priority.

Food, Drinks, Meals
Growing Entourage with 50 clusters of Instagram photos machine-tagged under the heading 'food/drinks/meals'.

Food/Drinks/Meals Zoomed
Closeup of plot above. Empty grid squares are centroid locations.

We begin with high-dimensional image data and then use an algorithm like k-means to find k clusters in the original feature space. Each cluster has a centroid, given as a point location in the original feature space. We project the centroids to 2D using a dimensionality reduction algorithm, like PCA or t-SNE. We then bin these coordinates to a grid (making sure that no two centroids have the same grid location, which is typically not difficult). Now we have complex similarity relations among cluster centroids, and it remains only to build these clusters of images on the grid at the 2D centroid locations.

Growing Entourage with 50 clusters of Instagram photos machine-tagged under the heading 'nature'

Nature Zoomed
Closeup of plot above. Empty grid squares are centroid locations.

continue to Part II