Statistics for The stability of test design: measuring differences in performance across several administrations of an academic literacy test