Data Sources and Tools "Out There"
Lynn Cherny, Assoc Prof of Data Science
emlyon business school
AIMS (June 2017)
The Plan
- Data Sources
- "On the shelf"
- APIs
- APIs in Google Sheets
- APIs Via R
- Scraping
- Services for analysis
- Tools to Help You (Without coding)
Data Lying Around
Out there
Data sets downloadable
Working with APIs
- APIs for Non-Programmers
- Examples of using APIs in R (from Hadley Wickham)
- Many open APIs here
API use and search
NYT example (requires an API key you get for free)
API calls from Google Sheets
Knoema Data Finder As an Add-on in Google Sheets
Sidebar on Sheets - search for data
Example Analysis in Sheets
After you have data in your sheet,
click on the "Explore" button on the lower right and click around in your data. It will recommend charts/analysis.
Blockspring data sources (free 14 day trial)
R Datasets and API access
Calling APIs from R (article)
R (Studio) Example :
NYT article search in R using API key and call
"Scraping" from websites without Code
Select what you want to "scrape"...
Get results in a spreadsheet...
Services (API) to help you analyze data
MS Azure APIs: many
Demos
TextRazor - entity analysis in Text
Giant Datasets in Google Big Query
Datasets:
GDELT (news article/event data)
NYC Cabs
Weather History
Chicago Crime
Bikes
Baseball, etc.
intro: What is BigQuery?
Athena on Amazon AWS (similar concept, newer)
Querying open Street Map data for the whole world
Also uses SQL. This is the single most useful data tool to learn, IMO.
Tutorial for my classes: SQLBolt
Tools to Help You
(without programming)
Text Analysis
(without programming)
PDF to Plain Text
- Command line utils for Linux: Poppler utils
- Online and Windows software (free): Bunch of Links/Reviews
- Mac software: One line, anyway
- Batch convert on Mac/Windows with scripting (counts as a little bit of code, definitely knowledge of command line)
Online example here
Concordance Software
- word counts (how many times each word appears)
- keyword in context (KWIC)
- collocations (words occurring with a term)
- n-grams (sequences of N words)
- stop words (words that are common and may be filtered out from analysis)
- sometimes, parts of speech (noun, verb, etc)
Ant Conc
Also works with a directory of files... Clinton emails converted to text from PDF:
Data Editing and Visualization
Trifacta Data "Wrangler"
Tableau
for Data Visualization and Exploration
OSM Data in Tableau
Research OSM tags, Query OSM on Athena, Reformat in Data Wrangler to extract Name and Site Type, Map in Tableau.
Tableau Event, If you are local...
June 22, 14h-16h at emlyon in Ecully
1. Robert Kosara, Tableau Research (on story telling with data)
2. Me on Demo/Tutorial of Tableau
My Slides
Lynn Cherny
@arnicas
Cherny@em-lyon.com
https://ghostweather.slides.com/lynncherny/data-sources-and-tools-out-there
other decks you might like, including "Text Analysis Without Programming"
Bernard
forgues@em-lyon.com
@bernardforgues
Clément
levallois@em-lyon.com
@seinecle
Thanks to Jenny Bryan for help with Google Sheets links.
Data Sources and Tools "Out There"
By Lynn Cherny
Data Sources and Tools "Out There"
Aimed at a non-programming audience primarily - data set access and tools.
- 5,883