ETS Standards
55
Calibration — (1) In item response theory, calibration is the process of estimating numbers (param-
eters) that measure the difficulty and discrimination of each item. (2) In the scoring of constructed
responses by human raters, calibration is the process of establishing consistency between raters in
their standards for awarding each possible rating. (3) In the automated scoring of constructed respons-
es, calibration is the process of adjusting the scoring engine (the computer program) to reproduce, as
closely as possible, the ratings given to a set of previously scored responses (the calibration sample).
See Automated Scoring of Constructed Responses, Holistic Scoring, Item Response Theory.
Canceled Score — A canceled score is a score that is not reported on a test that has been taken, or a
score that is removed from a test taker’s record. Such a score is not reportable. Scores may be canceled
voluntarily by the test taker or by ETS for testing irregularities, invalidity, misconduct, or other reasons.
See Irregularity.
Certification — Official recognition by a professional organization that a person has demonstrated
a high level of skill or proficiency. Sometimes, certification is used as a synonym for licensing.
Compare Licensing.
Client — An agency, association, foundation, organization, institution, or individual, that commissions
ETS to provide a product or service.
Coaching — Short-term instruction aimed directly at improving performance on a test. Coaching can
focus on test-taking strategies, on knowledge of the subject tested, or on both.
Common Items — A set of test questions included in two or more forms of a test for purposes of
equating. The common items may be dispersed among the items in the forms to be equated, or kept
together as an anchor test. Compare Anchor Test. See Equating.
Comparable Scores — Scores that allow meaningful comparisons in some group of test takers.
The term usually refers to scores on different tests or on different forms of the same test.
Compare Equating.
Composite Score — A score that is the combination of two or more scores by some specified
formula, usually a weighted sum.
Computer-Based Test — Any test administered on a computer.
Construct — The set of knowledge, skills, abilities, or traits a test is intended to measure, such as
knowledge of American history, reading comprehension, study skills, writing ability, logical reasoning,
honesty, intelligence, and so forth.
Construct Label — The name used to characterize the construct measured by a test. The construct
label is generally not sufficient by itself to describe fully the set of knowledge, skills, abilities, or traits a
test is intended to measure.
Construct-Irrelevant Variance — Differences between test takers’ scores that are caused by factors
other than differences in the knowledge, skills, abilities, or traits included in the construct the test is
intended to measure. See Construct, Variance.
Construct Validity — All the theoretical and empirical evidence bearing on what a test is actually
measuring, and on the qualities of the inferences made on the basis of the scores. Construct validity
was previously associated primarily with tests of abstract attributes, such as honesty, anxiety, or need
for achievement, rather than tests of more concrete attributes, such as knowledge of American history,
ability to fly an airplane, or writing ability. Now construct validity is seen as the sum of all types of evi-
dence bearing on the validity of test scores. The concept of various types of validity has been replaced