Visualizing Big Image Collections
Lynn Cherny (em-lyon)
Topics Touched On
- Issues with Big Data
- Datasets "Out There"
- A Few Projects on Big Image Collections
- Metadata Approaches
- Color Palettes
- A Few Tools Available
"Bin, Summarize, Smooth: A Framework for Visualizing Large Data"
Challenges:
What to display once you find it, and
How to display it.
There are only so many pixels on the screen.
Some Data Sets "Out There"...
the background.
Also: British Library images
French Street Name Signs Dataset
Images from Google Maps Streetview (> 1 million)
10K annotated cats
Aside: "Why do you care about annotation?"
A: Machine Learning.
Source on Google Big Query
SELECT * FROM [bigquery-public-data:open_images.dict] as dict
INNER JOIN [bigquery-public-data:open_images.labels] as lab
ON lab.label_name = dict.label_name
INNER JOIN [bigquery-public-data:open_images.images] as image
ON image.image_id = lab.image_id
WHERE dict.label_display_name = 'champagne'
AND lab.confidence >= .7
LIMIT 100
BigQuery SQL to find images tagged with "champagne":
Video Datasets
YouTube8m dataset from Google
Surveillance Video, VIRAT: "The dataset is designed to be realistic, natural and challenging for video surveillance domains in terms of its resolution, background clutter, diversity in scenes, and human activity/event categories"
Unsecured webcams:
& The Creepy...
And the playful and artistic.
The Sheep Market is a collection of 10,000 sheep created by workers on Amazon’s Mechanical Turk. Each worker was paid $.02 (US) to “draw a sheep facing left.”
"We collected Instagram posts and collected engagement logs per post such as the number of likes and comments. In addition to these features, we added visually meaningful tags such as facial emotion, brand logo, and the number of faces based on deep learning models."
Inspiring Projects
Small data, big impact: Context.
"What Makes Photo Cultures Different?"
For example, rather than thinking of “photography” as a single phenomenon, it is more precise to consider it as a collection of many different “photo cultures”, each with its set of distinct aesthetic rules and defining mechanisms. ... Using deep learning, we detect 1000 types of content in the dataset of 100,000 images [from Instagram]
Instagram and Contemporary Image, by Lev Manovich
Don't Forget About Metadata!
Examples
- Geotagging
- Time/Date
- Camera type and shot details (EXIF, IPTC, XMP...)
- Tags or descriptive text a user supplied in an app
Also see FotoForensics, imageforensic.org
Social Media Data is Rich
- Image
- Poster
- Text
- Hashtags
- Likes
- Comments
- Time/date
Twitter API is very structured, including media entities tweeted:
Movie/tv metadata: Closed Captioning, Subtitles...
True or False? Every scene in Hollywood movies by how "accurate" they are
#champagne text
Metadata: Mario Klingemann's tsne map of tags
Using tensorboard with Yelp Reviews
Yelp review dataset, w2v model by me in gensim shown in Tensorboard
Special Approaches:
Color Palette Studies
50 Years of Avengers Colors
#champagne palettes
Some Tools Available
and demos
APIs for use (with $)
Microsoft Vision API also offers....
- image analysis
- celebrity recognition
- video analysis
- extract text from image
- thumbnail creation
- (and sentiment from another one)
Reminders
- Big data for visual research is now "here" (if you want)
- "Easy" technical problems are collection, storage, and sometimes retrieval.
- By which I mean: time and money, but doable.
5 days of data and a small neural net... you get image search:
Slightly harder... clustering
the #champagne selfies...
#champagne with text...
Some takeaway observations
Increasingly theoretic-technical problems:
- Identifying content of images in mass (deep learning)
- Search (here's one, get me more like it: But in terms of what)
- Clustering (by what? color, content, metadata...)
- "Big picture" analysis of the set and trends: What picture do you want to convey, theoretically and descriptively?
- Contribution of metadata - combining text and image in useful, usable, interesting ways.
- Characterization of the whole and parts, the individuals and outliers.
The Research Blend of Technical and Theory
Post-Anissa's talk:
Can the "eye of the machine" see things the human observer might overlook or not see?
In a useful way, not just in a "mistake" way.
cherny@em-lyon.com
Slides at https://ghostweather.slides.com/lynncherny
Exploring Big Visual Collections
By Lynn Cherny
Exploring Big Visual Collections
A short talk on approaches to exploring big visual collections and some tools available
- 4,367