Visualizing Big Image Collections

Lynn Cherny (em-lyon)

Topics Touched On

Issues with Big Data
Datasets "Out There"
A Few Projects on Big Image Collections
Metadata Approaches
Color Palettes
A Few Tools Available

"Bin, Summarize, Smooth: A Framework for Visualizing Large Data"

Challenges:

What to display once you find it, and

How to display it.

There are only so many pixels on the screen.

Hadley Wickham

Some Data Sets "Out There"...

the background.

Database and Research page

"Deep Learning for Logo Recognition"

Link

Link to Flickr

Also: British Library images

French Street Name Signs Dataset

Images from Google Maps Streetview (> 1 million)

Paper

10K annotated cats

Link

Aside: "Why do you care about annotation?"

A: Machine Learning.

img source

Source on Google Big Query

SELECT * FROM [bigquery-public-data:open_images.dict] as dict
INNER JOIN [bigquery-public-data:open_images.labels] as lab
ON lab.label_name = dict.label_name
INNER JOIN [bigquery-public-data:open_images.images] as image
ON image.image_id = lab.image_id
WHERE dict.label_display_name = 'champagne'
AND lab.confidence >= .7
LIMIT 100

BigQuery SQL to find images tagged with "champagne":

Video Datasets

YouTube8m dataset from Google

Surveillance Video, VIRAT: "The dataset is designed to be realistic, natural and challenging for video surveillance domains in terms of its resolution, background clutter, diversity in scenes, and human activity/event categories"

Unsecured webcams:

@FFD8FFDB

& The Creepy...

And the playful and artistic.

The Sheep Market is a collection of 10,000 sheep created by workers on Amazon’s Mechanical Turk. Each worker was paid $.02 (US) to “draw a sheep facing left.”

Aaron Koblin

"Fashion Conversation Data on Instagram"

"We collected Instagram posts and collected engagement logs per post such as the number of likes and comments. In addition to these features, we added visually meaningful tags such as facial emotion, brand logo, and the number of faces based on deep learning models."

Inspiring Projects

selfiecity.net

NYPL digital collection browser

http://on-broadway.nyc/

Politiken story

Small data, big impact: Context.

"What Makes Photo Cultures Different?"

ACM Multimedia 2016, Lev Manovich, Miriam Redi (Bell Labs), Damon Crockett (UCSD), and Simon Osindero (Flickr)

For example, rather than thinking of “photography” as a single phenomenon, it is more precise to consider it as a collection of many different “photo cultures”, each with its set of distinct aesthetic rules and defining mechanisms. ... Using deep learning, we detect 1000 types of content in the dataset of 100,000 images [from Instagram]

Instagram and Contemporary Image, by Lev Manovich

Don't Forget About Metadata!

Examples

Geotagging
Time/Date
Camera type and shot details (EXIF, IPTC, XMP...)
Tags or descriptive text a user supplied in an app

Also see FotoForensics, imageforensic.org

"EXIF viewer by

Fluntro" from

quora link

Social Media Data is Rich

Image
Poster
Text
Hashtags
Likes
Comments
Time/date

Twitter API is very structured, including media entities tweeted:

Movie/tv metadata: Closed Captioning, Subtitles...

Cornell Movie Dialogue Corpus

Open Subtitles Corpus

"Gender Distinguishing Features in Film Dialogue"

True or False? Every scene in Hollywood movies by how "accurate" they are

Information Is Beautiful project

#champagne text

Metadata: Mario Klingemann's tsne map of tags

Using tensorboard with Yelp Reviews

Yelp review dataset, w2v model by me in gensim shown in Tensorboard

Special Approaches:

Color Palette Studies

50 Years of Avengers Colors

WSJ

Color Palettes of the New Yorker by Nicholas Rougeux

Brendan Dawes

NYT Fashion Week 2013

Martin Krzywinski's Image Color Summarizer tool

#champagne palettes

Some Tools Available

and demos

ImagePlot

APIs for use (with $)

Google Cloud Platform demo

Microsoft Azure API demo

IBM Watson demo

Microsoft Vision API also offers....

URL

image analysis
celebrity recognition
video analysis
extract text from image
thumbnail creation
(and sentiment from another one)

Reminders

Big data for visual research is now "here" (if you want)
"Easy" technical problems are collection, storage, and sometimes retrieval.
- By which I mean: time and money, but doable.

5 days of data and a small neural net... you get image search:

Code from Gene Kogan

Slightly harder... clustering

Thanks

to Gene

Kogan

Thanks

to Gene

Kogan

the #champagne selfies...

Thanks

to Gene

Kogan

#champagne with text...

Thanks

to Gene

Kogan

Some takeaway observations

Increasingly theoretic-technical problems:

Identifying content of images in mass (deep learning)
Search (here's one, get me more like it: But in terms of what)
Clustering (by what? color, content, metadata...)
"Big picture" analysis of the set and trends: What picture do you want to convey, theoretically and descriptively?
Contribution of metadata - combining text and image in useful, usable, interesting ways.
Characterization of the whole and parts, the individuals and outliers.

The Research Blend of Technical and Theory

Post-Anissa's talk:

Can the "eye of the machine" see things the human observer might overlook or not see?

In a useful way, not just in a "mistake" way.

cherny@em-lyon.com

Slides at https://ghostweather.slides.com/lynncherny