How it works

Rubrica reads scanned exams, anonymizes each one with a random ID, and grades against a rubric that has been decomposed into structured per-question partial-credit tiers with common error patterns and expected methods. Each criterion is scored independently by Claude Sonnet 4.6's vision API, then mapped back to the original student locally so identifying data never leaves the instructor's machine.

After grading, the system generates class performance diagnostics that surface which concepts the cohort missed, where partial credit clustered, and which rubric items showed the largest cross-model disagreement, useful signal for calibrating the next iteration.

Validation

Rubrica has been validated across microeconomics, finance, and data & decisions courses at UC Berkeley Haas. The most recent independent inter-rater audit covered 30 exams (840 scored items) from an undergraduate economics course. Each exam was re-graded by o4-mini as a cross-family reference scorer, eliminating shared training data and architectural biases as confounds. The production model hit ICC of 0.964 and QWK of 0.883, both well above established psychometric standards (Koo & Li 2016; Williamson et al. 2012). Mean absolute error: 0.15 pts per question. A small consistent +0.64-point generosity bias per exam is caught automatically by a boundary re-grading safeguard that triggers a second independent grading pass on any exam scoring within ±1.5% of the 90/80/70/60 letter-grade cutoffs.

Production safeguards

Beyond boundary re-grading, the pipeline runs MC double-read verification on zero-scored multiple-choice items, contradiction detection between feedback and scores, feedback specificity enforcement on vague comments, handwriting confidence flagging for ambiguous pages, and a review-flag gate that prevents finalization until an instructor dismisses each unresolved flag. A pre-safeguard raw-score snapshot is preserved for apples-to-apples cross-family auditability.

Python / Flask Claude Sonnet 4.6 o4-mini (audit) Ollama (local PII) SQLite FERPA