Visiting Knight Chair
School of Comm, U of Miami
@arnicas / email@example.com
or click right/down arrow
to go into a topic.
use the space bar to hit every slide!
(Also: Focus here is on free or very cheap tools)
But beware... a lot of tools only take text, so you have to do the work first.
WSJ (Oct 14)
> grep CLINTON dem_debate_2015_10_13_wapo.txt CLINTON: Well, thank you, and thanks to everyone for hosting this first of the Democratic debates. CLINTON: Well, actually, I have been very consistent. Over the course of my entire life, I have always fought for the same values and principles, but, like most human beings -- including those of us who run for office -- I do absorb new information. I do look at what's happening in the world. CLINTON: No. I think that, like most people that I know, I have a range of views, but they are rooted in my values and my experience. And I don't take a back seat to anyone when it comes to progressive experience and progressive commitment. CLINTON: I'm a progressive. But I'm a progressive who likes to get things done. And I know... .... many more lines....
>grep CLINTON dem_debate_2015_10_13_wapo.txt | wc -l 74 >grep SANDERS dem_debate_2015_10_13_wapo.txt | wc -l 70 >grep WEBB dem_debate_2015_10_13_wapo.txt | wc -l 35
> wc -w dem_debate_2015_10_13_wapo.txt 22953 dem_debate_2015_10_13_wapo.tx > wc -w gop_debate_2015_9_16_wapo.txt 35127 gop_debate_2015_9_16_wapo.txt
wc is unix for "word count." wc -w is just count words. You can also wc -l (lines).
Also see Word Count in Word 2013 How-To
"Last Night's Debate Was Longer than The Book of Genesis" (C Ingraham, WaPo)
>wc -w * | sort
.... lots of them go by...
1800 C05781926.txt 1897 C05782687.txt 2202 C05785187.txt 2562 C05782645.txt 2705 C05782303.txt 3879 C05782890.txt 4266 C05782607.txt 4322 C05782571.txt 5697 C05781825.txt 211567 total
I'm not going to go through this with you, sometimes programming is easier.
Using AntConc, at what points in the transcripts of the last GOP candidate debate and Democratic candidate debate did "(APPLAUSE)" occur?
transcript source scraped off Washington Post, using AntConc
AntConc on debate transcripts
Overview Project on sample of Clinton emails
Formerly in the free Many Eyes, now requires code in Google Charts
Wordle site -- uses Java applet, only runs in Firefox/Safari for me
But geez do people love them.
Without trimming words
Without trimming words
There are more terms used in GOP debate, but term frequency avg is much higher; more repetition of refrains?
See also Open Calais API demo page
Simple example in Overview Project
Who is IRENE??
(of course the Hillary emails are in Document Cloud too)
Overview Project, on sample of Clinton emails
NZZ site, also done with code
A small sample of Hillary's emails in Lexos....
Historical Document Corpora
WaPo wonkblog, Chris Ingraham
WaPo wonkblog, Emily Badger
Site (thanks to Heather Froehlich)
Culturomics, Ben Schmidt
Coding talk coming up...