Week 5: Educational Data Mining

Bodong Chen
Feb 23, 2015

SIGs and WGs

Ready to roll?? Yeah!

  • Plan in advance
  • Meet with Bodong one week in advance to finalize plans

(Updates from the CSCL + LA workshop)

Readings

  1. Scheuer, O. and McLaren, B. M. (2012). Educational data mining (or link 2). In Encyclopedia of the Sciences of Learning, pages 1075–1079. Springer.
  2. Baker, R.S.J.d., Yacef, K. (2009) The State of Educational Data Mining in 2009: A Review and Future Visions. Journal of Educational Data Mining, 1 (1), 3-17.

Issues in discussion

  • Statistics vs. Data Mining
  • EDM vs. LA

EDM (paper 1 & 2)

Educational Data Mining is concerned with developing, researching, and applying computerized methods to detect patterns in large collections of educational data – patterns that would otherwise be hard or impossible to analyze due to the enormous volume of data they exist within.

Educational Data Mining is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using those methods to better understand students, and the settings which they learn in.

(vs. LA?)

Typical steps in an EDM project (paper 1)

  • data acquisition
  • data preprocessing
  • data mining
  • validation of results

(vs. LA?)

Methods in EDM (paper 1)

  • Prediction
  • Clustering
  • Relationship mining
  • Distillation of data for human judgment
  • Discovery with models

Application Areas

Bienkowski, 2012 (Week 2)

  • User knowledge modeling
  • User behavior modeling
  • User experience modeling
  • User profiling
  • Domain modeling
  • Trend analysis
  • Adaptation and personalization

Application Areas (paper 1)

  • Scientific inquiry and system evaluation
  • Determining student model parameters
  • Informing domain models
  • Creating diagnostic models
  • Creating reports and alerts for instructors, students and other stakeholders
  • Recommending resources and activities

Application Areas (paper 2)

  • Statistics and visualization
  • Web mining
    • Clustering, classification, and outlier detection
    • Association rule mining and sequential pattern mining o Text mining

Terms

  • machine learning
  • text mining
  • psychometrics
  • web log analysis
  • student model
  • supervised vs. unsupervised learning
  • over-fitting vs. under-fitting
  • cross-validation
  • relationship mining
  • feature engineering
  • A/B testing

Feature engineering

Data Science?

Statistics vs Data Mining

Statistics Data Mining
Math is important Yep, math is important
Model-based (theoretical driven) Ad hoc (for a particular purpose)
A method/model that should work prior to its use Experimental, ongoing refinement
Rigor is key 'Adventurous' to some degree
Model is 'king' Criteria of picking features is key
Smaller data – sample->population inference Large data – could cover the population
Cleaner data Messy data – data wrangling and cleansing
Numeric Various forms of data
Confirmatory (mostly) Exploratory (esp. with big dataset)
Generalization ability is important Model 'fit' is important
Algorithms are less central Algorithms are central
Data–analyst interaction Data–(super-)computer–analyst interaction
Less likely to be real-time Real-time in many cases

Week 7: Cases and Examples

Find cases, share & discuss them in KF

  • highlight 'awesomeness' :)
  • connect with readings

Guest speaker: Vitomir Kovanovic, University of Edinburgh

  • When: 4:40PM (as usual)
  • Where: Google Hangouts