Bodong Chen
March 5, 2015
Gathering, scraping, organizing, and mapping disparate datasets; converting, transforming, formatting, and visualizing data; managing and storing data; and more…
(Data Wrangler – a sexy job :))
The iterative process of wrangling and analysis (paper)
(Jeff Leek, The Elements of Data Analytic Style)
(Jeff Leek, The Elements of Data Analytic Style)
by Ethan Brown
by Bodong Chen
“Data Science Studio (DSS) is a software platform that aggregates all the steps and big data tools necessary to get from raw data to production ready applications.”
Key concepts in DSS
The Toronto District School Board (TDSB) uses a Learning Opportunity Index (LOI) to “rank each school based on measures of external challenges affecting student success; the school with the greatest level of external challenges is ranked number one and is described as highest on the index.” TDSB recalculates LOI every two years.
Some questions to start with…
The data is in PDF!
Using Adobe Acrobat!
Cleansing