skills/domains/education/mooc-analytics-guide/SKILL.md
Analyzing MOOC data, learning analytics, and online education metrics
npx skillsauth add wentorai/research-plugins mooc-analytics-guideInstall this skill globally with one command. Works with Claude Code, Cursor, and Windsurf.
3 of 9 scanners reported clean
Some scanners were skipped, did not run, or reported a non-clean status. Review each row below.
A skill for analyzing Massive Open Online Course data, implementing learning analytics pipelines, and extracting actionable insights from online education platforms. Covers clickstream processing, engagement modeling, dropout prediction, and A/B testing for course design.
MOOC platforms export several standard data types:
| Data Type | Description | Typical Format | |-----------|-------------|----------------| | Clickstream logs | Page views, video plays, pauses, seeks | JSON event logs | | Forum posts | Discussion text, timestamps, thread structure | CSV/JSON | | Grade records | Assignment scores, quiz attempts, certificates | CSV | | Course structure | Module hierarchy, release dates, prerequisites | XML/JSON | | Survey responses | Pre/post course surveys, demographics | CSV |
Several open datasets are available for research:
import pandas as pd
# Load OULAD dataset (publicly available)
students = pd.read_csv("studentInfo.csv")
assessments = pd.read_csv("assessments.csv")
interactions = pd.read_csv("studentVle.csv")
# Basic engagement metric: total clicks per student per course
engagement = (
interactions
.groupby(["id_student", "code_module", "code_presentation"])
.agg(total_clicks=("sum_click", "sum"),
active_days=("date", "nunique"))
.reset_index()
)
print(engagement.describe())
Key metrics used in learning analytics research:
import numpy as np
def regularity_index(daily_counts: np.ndarray) -> float:
"""
Compute regularity index based on Shannon entropy.
Lower values indicate more regular study patterns.
daily_counts: array of click counts per day over the course.
"""
total = daily_counts.sum()
if total == 0:
return float("nan")
probs = daily_counts / total
probs = probs[probs > 0]
entropy = -np.sum(probs * np.log2(probs))
max_entropy = np.log2(len(daily_counts))
return round(entropy / max_entropy, 4) # normalized [0, 1]
Predicting which learners will drop out is a central MOOC analytics task:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import roc_auc_score
# Feature engineering: weekly aggregates
features = [
"clicks_week", "video_time_week", "forum_posts_week",
"assignments_submitted", "avg_score", "days_since_last_login",
"regularity_index", "week_number"
]
X = weekly_features[features]
y = weekly_features["dropped_next_week"]
# Time-aware cross-validation (no future leakage)
tscv = TimeSeriesSplit(n_splits=5)
aucs = []
for train_idx, test_idx in tscv.split(X):
model = GradientBoostingClassifier(
n_estimators=200, max_depth=4, learning_rate=0.1
)
model.fit(X.iloc[train_idx], y.iloc[train_idx])
pred = model.predict_proba(X.iloc[test_idx])[:, 1]
aucs.append(roc_auc_score(y.iloc[test_idx], pred))
print(f"Mean AUC: {np.mean(aucs):.3f} +/- {np.std(aucs):.3f}")
Video interaction is the primary learning activity in MOOCs. Analyzing play, pause, seek, and speed-change events reveals learning patterns:
def compute_video_metrics(events: pd.DataFrame) -> dict:
"""
Process video clickstream events into engagement metrics.
events: DataFrame with columns [user_id, video_id, event_type,
timestamp, position_seconds, video_duration]
"""
plays = events[events.event_type == "play"]
pauses = events[events.event_type == "pause"]
seeks = events[events.event_type == "seek"]
total_duration = events.video_duration.iloc[0]
watched_positions = set()
for _, row in plays.iterrows():
start = int(row.position_seconds)
# Estimate 10-second watch window per play event
for sec in range(start, min(start + 10, int(total_duration))):
watched_positions.add(sec)
return {
"play_count": len(plays),
"pause_count": len(pauses),
"seek_count": len(seeks),
"coverage_ratio": len(watched_positions) / max(total_duration, 1),
"replay_indicator": len(plays) > 1,
}
Research findings on video engagement (Guo et al., 2014):
MOOCs provide large sample sizes ideal for randomized experiments:
from scipy.stats import norm
def mooc_power_analysis(effect_size: float, n_per_group: int,
alpha: float = 0.05) -> float:
"""Compute statistical power for a two-sample t-test in MOOC A/B test."""
z_alpha = norm.ppf(1 - alpha / 2)
z_beta = effect_size * (n_per_group ** 0.5) / 2 - z_alpha
power = norm.cdf(z_beta)
return round(power, 4)
# Example: 5000 per group, small effect
print(mooc_power_analysis(0.1, 5000)) # ~0.94
tools
10 document processing skills. Trigger: extracting text from PDFs, parsing references, document Q&A. Design: parsing pipelines (GROBID, marker) and structured extraction tools.
documentation
Guide to tldraw for infinite canvas whiteboarding and diagram creation
testing
Create graphical abstracts, schematic diagrams, and scientific illustrations
documentation
Create UML diagrams and architecture visualizations with PlantUML