A researcher wants to measure content-sampling error and has two versions of an achievement test available. What measure of estimating reliability would be best in this situation?