Data Sources and Tools "Out There"

Lynn Cherny, Assoc Prof of Data Science

emlyon business school

AIMS (June 2017)

The Plan

  • Data Sources
    • "On the shelf"
    • APIs 
    • APIs in Google Sheets
    • APIs Via R
    • Scraping
  • Services for analysis
  • Tools to Help You (Without coding)

Data Lying Around

Out there

Data sets downloadable

There are too many to list.  Some places to look for data...

 

My pinboard tags for  "datasets"

World Bank open data

OECD Data

Data.gouv.fr

UN data

Working with APIs

API use and search

NYT example (requires an API key you get for free)

API calls from Google Sheets

Knoema Data Finder As an Add-on in Google Sheets

Sidebar on Sheets - search for data

Example Analysis in Sheets

After you have data in your sheet, 

click on the "Explore" button on the lower right and click around in your data. It will recommend charts/analysis.

Blockspring data sources (free 14 day trial)

R Datasets and API access

R (Studio) Example :

NYT article search in R using API key and call

"Scraping" from websites without Code

Select what you want to "scrape"...

Get results in a spreadsheet...

Services (API) to help you analyze data

MS Azure APIs: many

Demos

TextRazor - entity analysis in Text

Giant Datasets in Google Big Query

Datasets:

GDELT (news article/event data)

NYC Cabs

Weather History

Chicago Crime

Bikes

Baseball, etc.

Athena on Amazon AWS (similar concept, newer)

Querying open Street Map data for the whole world

Also uses SQL. This is the single most useful data tool to learn, IMO.

Tutorial for my classes: SQLBolt

Tools to Help You

(without programming)

Text Analysis

(without programming)

PDF to Plain Text

Online example here

Concordance Software

  • word counts (how many times each word appears)
  • keyword in context (KWIC)
  • collocations (words occurring with a term)
  • n-grams (sequences of N words)
  • stop words (words that are common and may be filtered out from analysis)
  • sometimes, parts of speech (noun, verb, etc)

Ant Conc

Also works with a directory of files...  Clinton emails converted to text from PDF:

Data Editing and Visualization

Trifacta Data "Wrangler"

Tableau

for Data Visualization and Exploration

OSM Data in Tableau

Research OSM tags,  Query OSM on Athena, Reformat in Data Wrangler to extract Name and Site Type, Map in Tableau.

Tableau Event, If you are local...

June 22, 14h-16h at emlyon in Ecully

 

1. Robert Kosara, Tableau Research (on story telling with data) 

2. Me on Demo/Tutorial of Tableau

My Slides

Lynn Cherny

@arnicas

Cherny@em-lyon.com

 

https://ghostweather.slides.com/lynncherny/data-sources-and-tools-out-there

other decks you might like, including "Text Analysis Without Programming"

Bernard

forgues@em-lyon.com

@bernardforgues
Clément

levallois@em-lyon.com

@seinecle

 

Thanks to Jenny Bryan for help with Google Sheets links.