1
2
3
4
5
6
7
8
Regional
Educational
Laboratory
Central
At Marzano Research
A Publication of the National Center for Education Evaluation and Regional Assistance at IES
REL 2022 112
U.S. DEPARTMENT OF EDUCATION
Program Evaluation Toolkit:
Quick Start Guide
     
Joshua Stewart, Jeanee Joyce, Mckenzie Haines, David Yanoski,
Douglas Gagnon, Kyle Luke, Christopher Rhoads, and Carrie Germeroth 
         
   

           


i


 
 
 

 
 
 
 

  


1 Guiding quesons for the Program Evaluaon Toolkit  3
2 Tracker for the Program Evaluaon Toolkit  4
3 Opening page of the Program Evaluaon Toolkit website  6
4 Opening page of Module 1 on the Program Evaluaon Toolkit website  8

1 Module selecon checklist 5
ii
 
 
 



The Program Evaluaon Toolkit presents a step-by-step process for conducng a program
evaluaon Program evaluaon is important for assessing the implementaon and outcomes
of local, state, and federal programs Designed to be used in a variety of educaon sengs,
the toolkit focuses on the praccal applicaon of program evaluaon The toolkit can also
build your understanding of program evaluaon so that you can be beer equipped to
understand the evaluaon process and use evaluaon pracces
The toolkit consists of this Quick Start Guide and a website with eight modules that begin at
the planning stages of an evaluaon and progress to the presentaon of ndings to stake-
holders Each module covers a crical step in the evaluaon process
The toolkit is available at hps://iesedgov/ncee/edlabs/regions/central/resources/pemtoolkit/
indexasp
The toolkit includes a screencast that provides an overview of each stage of the evalua-
on process It also includes tools, handouts, worksheets, and a glossary of terms (see the
appendix of this guide) to help you conduct your own evaluaon The toolkit resources will
help you create a logic model, develop evaluaon quesons, idenfy data sources, develop
data collecon instruments, conduct basic analyses, and disseminate ndings

Program evaluaon is the systemac process for planning, documenng, and assessing
the implementaon and outcomes of a program Evaluaons oen address the following
quesons:
• Is the program eecve?
• Can the program be improved?
A well-thought-out evaluaon can idenfy barriers to program eecveness, as well as cat-
alysts for program successes Program evaluaon begins with outlining the framework for
the program, determining quesons about program milestones and goals, idenfying what
data address the quesons, and choosing the appropriate analycal method to address the
quesons By the end, an evaluaon should provide easy-to-understand ndings, as well as
recommendaons or possible acons


The primary audience for the toolkit is individuals who evaluate local, state, or federal pro-
grams Other individuals engaged in program evaluaon might also benet from the toolkit
The toolkit will be parcularly helpful to individuals responsible for:
• Designing evaluaons of program implementaon and outcomes
• Collecng and analyzing data about program implementaon and outcomes
• Wring reports or disseminang informaon about program implementaon and
outcomes

This toolkit covers the main components of program evaluaon, from foundaonal pracces
to quantave and qualitave methods, to disseminaon of ndings The toolkit content is
broad and might challenge you to think in new ways However, you do not need prior expe-
rience or advanced training in program evaluaon to benet from using the toolkit In addi-
on to the main content for general users, oponal resources in the toolkit can help more
advanced users rene their knowledge, skills, and abilies in program evaluaon
The following quesons can help you determine your readiness to use the toolkit without
support from colleagues:
• Are you, or have you been, engaged in program evaluaon?
• Do you have basic data literacy, gained from some experience in gathering, reviewing,
and using data?

You can progress through the toolkit modules either sequenally or selecvely, review-
ing only modules that pertain directly to your current evaluaon needs (gure 1) In each
module the rst chapter provides a basic introducon to the module topic, and the sub-
sequent chapters increase in complexity and build on the basic introducon For each
module you can decide what level of complexity best meets your program evaluaon needs
Modules, 3, 4, and 7 require stascal knowledge If you lack stascal experse, you might
consider working through them with a colleague who has stascal experse You can use
the toolkit tracker to document your progress (gure 2) In the tracker you can record when
you start a module and which modules you have completed
It is best to start with Module 1: Logic models, which focuses on developing a logic model
for your program A logic model is a graphical representaon of the relaonship between
program components and desired outcomes A well-craed logic model will serve as the
foundaon for the other modules in the toolkit You will draw on your logic model when
developing measurable evaluaon quesons, idenfying quality data sources, and selecng
appropriate analyses and other key components of your evaluaon If you choose to prog-
ress through the toolkit selecvely, the module selecon checklist can help you idenfy
which modules to priorize (table 1)


1
2
7
3
6
8
4
5
Module 1 Logic models
What is the purpose of a logic model?
How do I describe my program using a logic model?
Module 2 — Evaluaonquesons
How do evaluaon quesons relate to the logic
model?
How do I write high-quality evaluaon quesons for
my program?
Module 3 — Evaluaondesign
Which design will best meet my evaluation needs?
What is the relationship between my evaluation design
and Every Student Succeeds Act levels of evidence and
What Works Clearinghouse design standards?
Module 4 — Evaluaonsamples
How do I determine whom to include in my data
collecon sample?
How do I determine the best sample size for my
evaluaon?
Module 5 — Dataquality
What available data can I identify that can be used to
answer my evaluaon quesons?
How do I assess the quality of my data?
Module 6 — Datacollecon
What data collecon instruments will best help me
answer my evaluaon quesons?
How do I develop a simple but eecve data collecon
instrument?
Module 7 — Dataanalysis
How do I move from analysis to recommendations?
Which analysis method best meets my evaluation
needs?
Module 8 — Disseminaonapproaches
How do I use ndings to address the evaluaon
quesons?
How do I communicate results to target audiences using
appropriate graphics?
ProgramEvaluaonToolkitGuidingQuesons
RegionalEducaonalLaboratory
CentralatMarzanoResearch
Source: Authors’ creaon


1
2
7
3
6
8
4
5
Module 1 Logic models
Module 2 — 
Module 3 — 
Module 4 — 
Module 5 — 
Module 6 — 
Module 7 — 




Started Completed Started Completed
Started Completed Started Completed
Started Completed Started Completed
Started Completed Started Completed
Source: Authors’ creaon

a This module includes technical informaon and might require more advanced stascal knowledge
Source: Authors’ compilaon

This toolkit provides tools and resources for general program evaluaon Although the
toolkit can help you establish a common language around program evaluaon and use
resources for basic evaluaon purposes, it does not include detailed informaon on topics,
such as the advanced stascal methods of regression disconnuity designs, dierence in
dierences designs, propensity score matching, crossover designs, and mullevel model-
ing Instead, the toolkit will support you in execung simpler designs and analyses using
widely available soware and materials The toolkit is designed for individuals with a basic
understanding of data, stascs, and evaluaon If your evaluaon requires more complex
methodologies or analyses, consider consulng an evaluaon expert at a university or
college, reaching out to the Regional Educaonal Laboratory in your region, or checking out
addional resources, such as the free soware RCT-YES

When you rst open the Program Evaluaon Toolkit website, you will nd an introducon to
the toolkit and links to each of the eight modules (gure 3)
Table 1. Module selecon checklist
Module What are my evaluaon needs?
1Logicmodels Ineedtoclearlydenemyprogramandmyexpectedoutcomes
2Evaluaonquesons Ineedtodeveloporreneasetofrelevantandmeasurableevaluaonquesons
3Evaluaondesign Ineedtoidenfyanevaluaondesignthatwillensurethatclaimsmadefrommy
evaluaonarejusableandaligntoersofevidenceundertheEveryStudent
SucceedsActandWhatWorksClearinghousedesignstandards
a
4Evaluaonsamples Ineedtodeterminewhichparcipants(forexample,students,parents)andhowmany
toincludeinmyevaluaon
a
5Dataquality Ineedtoidenfyavailabledata(forexample,stateassessments,studentaendance)
toaddressmyevaluaonquesonsandassessthequalityoftheavailabledata
6Datacollecon Ineedtodeveloporidenfyqualityinstruments(forexample,focusgroupprotocols
orsurveys)tocollectaddionaldata
7Dataanalysis Ineedtoanalyzemydataandmakerecommendaonsfornextstepsto
decisionmakers
a
8Disseminaon
approaches
Ineedtosharethendingsofmyevaluaonwithdierentaudiences(forexample,
teachers,communitymembers)


Source: Authors’ creaon

Clicking on any of the eight module links will bring you to a webpage with informaon about
the module content, organized into chapters (gure 4) You can use the chapters to engage
with the module content in smaller secons Each chapter includes a short video that
explains the content and a link to the PowerPoint slides used in the video In addion, each
module webpage includes links to the tools, handouts, and worksheets used in the module
You can download and print these materials to use while watching the video, or you can use
them while conducng your own evaluaon



  

I ES·:• REL


    
CENTRAL
ram
  

Guide

 
 
  
 
 
 

 

1


 
 
 
 
 
 
 

Module 1: Logic Models

  
 
 
   

   
    

 
 

 





  

  

 
n E
L

I ES ·.
 
  





 











 



 






Institute of Education Sciences





U.S. Department of
Education
 




Source: Authors’ creaon


The following secons provide short overviews of the eight modules in the toolkit For clari-
caon, key terms are linked to their glossary denions in the appendix of this guide

  36 minutes
Module 1 guides you through developing a logic model for a program The module contains
four chapters that will help you do the following:
• Chapter 1: Understand the purpose and components of logic models
• Chapter 2: Write a problem statement to beer understand the problem that the
program is designed to address
• Chapter 3: Use the logic model to describe the program’s resources, acvies, and
outputs
• Chapter 4: Use the logic model to describe the short-term, mid-term, and long-term
outcomes of the program
Chapter 1 reviews the purpose of logic models and introduces the logic model components
Chapter 2 explains how to write a problem statement that describes the reason and context
for implemenng the program Chapters 3 and 4 present the central logic model compo-
nents: resources, acvies, outputs, and short-term, mid-term, and long-term outcomes
These two chapters also explain how the components relate to and inform the overall logic
model In addion, the module highlights available resources on logic model development

  37 minutes
Module 2 guides you through wring measurable evaluaon quesons that are aligned to
your logic model The module contains three chapters that will help you do the following:
• Chapter 1: Learn the dierence between process and outcome evaluaon quesons and
understand how they relate to your logic model
• Chapter 2: Use a systemac framework to write, review, and modify evaluaon quesons
• Chapter 3: Priorize quesons to address in the evaluaon
Chapter 1 introduces the two main types of evaluaon quesons (process and outcome) and
explains how each type aligns to the logic model Chapter 2 presents a systemac frame-
work for developing and revising evaluaon quesons and then applies that framework to
sample evaluaon quesons Chapter 3 describes and models a process for priorizing eval-
uaon quesons The module includes worksheets to help you write, review, and priorize
evaluaon quesons for your own program
 
 
 


  37 minutes
Module 3 reviews major consideraons for designing an evaluaon The module contains
three chapters that will help you understand the following:
• Chapter 1: The major categories of evaluaon design, including when to use each design
• Chapter 2: Threats to validity, including how to consider these threats when designing an
evaluaon
• Chapter 3: The relaonship between evaluaon design and Every Student Succeeds Act
(ESSA) ers of evidence and What Works Clearinghouse (WWC) design standards
Chapter 1 introduces four major categories of evaluaon design: descripve designs, correla-
onal designs, quasi-experimental designs, and randomized controlled trials The chapter
explains consideraons for when to use each category, including which is suited to the two
types of evaluaon quesons (see module 2) Chapter 2 presents threats to internal and
external validity and provides examples of common challenges in designing evaluaons
Chapter 3 discusses the four ers of evidence in ESSA and the three rangs of WWC design
standards The chapter explains how each er or rang connects to evaluaon design
choices The module includes acvies to help you idenfy appropriate evaluaon designs
and links to resources from which you can learn more about the ESSA ers of evidence and
WWC design standards

  57 minutes
Module 4 provides an overview of sampling consideraons in evaluaon design and data
collecon The module contains three chapters that will help you understand the following:
• Chapter 1: The purpose and importance of sampling
• Chapter 2: Sampling techniques that you can use to obtain a desirable sample
• Chapter 3: Methods for determining sample size and for creang a sampling plan for your
evaluaon
Chapter 1 reviews the purpose of sampling and denes key terms, including representa-
veness, generalizability, and weighng The chapter also details the process for selecng
a representave sample Chapter 2 covers the dierent types of random and nonrandom
sampling techniques Chapter 3 introduces a tool for determining the opmal sample size, as
well as a process for draing a sampling plan

 
 
 


  30 minutes
Module 5 provides an overview of data quality consideraons The module also covers align-
ing data to evaluaon quesons The module contains three chapters that will help you do
the following:
• Chapter 1: Idenfy the two major types of data and describe how to use them in an
evaluaon
• Chapter 2: Evaluate the quality of your data, using six key criteria
• Chapter 3: Connect data to your evaluaon quesons
Chapter 1 discusses the two main types of data (quantave and qualitave) and explains
how to use both types of data to form a more complete picture of the implementaon and
outcomes of your program Chapter 2 discusses the key elements of data quality: validity,
reliability, meliness, comprehensiveness, trustworthiness, and completeness In addion,
the chapter includes a checklist for assessing the quality of data Chapter 3 covers the align-
ment of data to evaluaon quesons The chapter introduces the evaluaon matrix, a useful
tool for planning your evaluaon and the data you need to collect

  42 minutes
Module 6 presents best pracces in developing data collecon instruments and describes
how to create quality instruments to meet data collecon needs The module contains three
chapters that will help you do the following:
• Chapter 1: Plan and conduct interviews and focus groups
• Chapter 2: Plan and conduct observaons
• Chapter 3: Design surveys
Chapter 1 describes how to prepare for and conduct interviews and focus groups to collect
data to answer evaluaon quesons Chapter 2 covers developing and using observaon
protocols that include, for example, recording checklists and open eld notes, to collect
data Chapter 3 focuses on survey development and implementaon Each chapter includes
guiding documents, examples of data collecon instruments, and a step-by-step process for
choosing and developing an instrument that best meets your evaluaon needs



  53 minutes
Module 7 reviews major consideraons for analyzing data and making recommendaons
based on ndings from the analysis The module contains three chapters that will help you
understand the following:
• Chapter 1: Common approaches to data preparaon and analysis
• Chapter 2: Basic analyses to build analyc capacity
• Chapter 3: Implicaons of ndings and how to make jusable recommendaons
Chapter 1 reviews common techniques for data preparaon, such as idenfying data errors
and cleaning data It then introduces quantave methods, including basic descripve
methods and linear regression The chapter also reviews basic qualitave methods Chapter
2 focuses on cleaning and analyzing quantave and qualitave datasets, applying the
methods from chapter 1 Chapter 3 presents a framework and guiding quesons for moving
from analysis to interpretaon of the ndings and then to making defensible recommenda-
ons based on the ndings

  47 minutes
Module 8 presents best pracces in disseminang and sharing the evaluaon ndings The
module contains two chapters that will help you do the following:
• Chapter 1: Learn how to develop a disseminaon plan
• Chapter 2: Explore best pracces in data visualizaon
Chapter 1 describes a disseminaon plan and explains why a plan is helpful for sharing eval-
uaon ndings It then outlines key consideraons for developing a disseminaon plan, such
as the audience, the message, the best approach for communicang the message, and the
best me to share the informaon with the audience The chapter also includes important
consideraons for ensuring that disseminaon products are accessible to all members of the
audience Chapter 2 reviews key consideraons for visualizing data, including the audience,
message, and approach The chapter also presents examples of data visualizaons, including
graphs, charts, and tables, that can help make the data more easily understandable

 
 



The development of this toolkit arose in response to the Colorado Department of Edu-
caon’s need for tools and procedures to help districts systemacally plan and conduct
program evaluaons related to locally implemented iniaves The Regional Educaonal
Laboratory Central partnered with the Colorado Department of Educaon to develop an
evaluaon framework and a set of curated resources that cover program evaluaon from
the planning stages to presentaon of ndings The Program Evaluaon Toolkit is an expan-
sion of this collaborave work

 

This appendix provides denions of key terms used in the Program Evaluaon Toolkit
Terms are organized by module and listed in the order in which they are introduced in each
module

 A graphical representaon of the relaonship between the parts of a program
and its expected outcomes
  A descripon of the problem that the program is designed to address
 All the available means to address the problem, including investments, materi-
als, and personnel Resources can include human resources, monetary resources, facilies,
experse, curricula and materials, me, and any other contribuons to implemenng the
program
 Acons taken to implement the program or address the problem Acvies can
include professional development sessions, aer-school programs, policy or procedure
changes, use of a curriculum or teaching pracce, mentoring or coaching, and development
of new materials
 Evidence of program implementaon Outputs can include required deliverables,
the number of acvies, newly developed materials, new policies or procedures, observa-
ons of the program in use, numbers of students or teachers involved, and other data that
provide evidence of the implementaon of acvies in the program
 The ancipated results once you implement the program Outcomes are divided
into three types:
  The most immediate results for parcipants that can be
aributed to program acvies Short-term outcomes are typically changes in knowl-
edge or skills Short-term outcomes are expected immediately following exposure to
the program (or shortly thereaer)
  The more distant, though ancipated, results of parcipaon in
program acvies that require more me to achieve Mid-term outcomes are typically
changes in atudes, behaviors, and pracces Mid-term results are expected aer the
parcipants in the program have had sucient me to implement the knowledge and
skills that they have learned
  The ulmately desired outcomes from implemenng program
acvies Long-term results are expected aer the changes in atudes, behaviors,
and pracces have been in place for a sucient period of me They are typically

 

systemic changes or changes in student outcomes They might not be the sole result of
the program, but they are associated with it and might manifest themselves aer the
program concludes
  Important details or ideas that do not t into the other com-
ponents of the logic model Addional consideraons can include assumpons about the
program, external factors not covered in the problem statement, and factors that might
inuence program implementaon but are beyond the evaluaon team’s control

 
 
 


  The quesons that the evaluaon is designed to answer Evaluaon
quesons typically focus on promong program improvement or determining the impact of
a program There are two main types of evaluaon quesons:
  Quesons about the quality of program implementaon and
improvement They are also called formave quesons
  Quesons about the impact of a program over me They are also
called summave quesons
 A framework for creang quality evaluaon quesons PARSEC is an acronym for
pernent, answerable, reasonable, specic, evaluave, and complete
 A queson is strongly related to the informaon that program stakeholders
and parcipants want to obtain from an evaluaon Pernent quesons are derived
from the logic model
 The data needed to answer a queson are available or aainable
 A queson is linked to what a program can praccally and realiscally
achieve or inuence
 A queson directly addresses a single component of the logic model Specic
quesons are clearly worded and avoid broad generalizaons
 The answer to a queson is aconable Evaluave quesons can inform
changes to a program, policy, or iniave
 The enre set of quesons addresses all the logic model components that
are of crical interest
 A strategy for priorizing evaluaon quesons based on
their importance and urgency
 An important queson is necessary to improve or assess a program
 An urgent queson needs an answer as soon as possible, either to sasfy
reporng requirements or to obtain necessary informaon before moving forward

 
 
 
 
 
 


  The data collecon processes and analyc methods used to answer the
evaluaon quesons An evaluaon design should be informed by the program goals, logic
model, evaluaon quesons, available resources, and funding requirements There are four
broad categories of evaluaon design:
 Used to describe a program by addressing “who,” “what,”
“where,” “when,” and “to what extent quesons as they relate to the program
  Used to idenfy a relaonship between two variables and deter-
mine whether that relaonship is stascally meaningful Correlaonal analyses do not
demonstrate causality They can nd that X is related to Y, but they cannot nd that X
caused Y
   Used to determine whether an intervenon
caused the intended outcomes In QEDs individuals are not randomly assigned to
groups because of ethical or praccal constraints Instead, equivalent groups are
created through matching or other stascal adjustments
 Used to determine whether an intervenon
caused the intended outcomes RCTs involve randomizaon, a process like a coin toss,
to assign individuals to the treatment or comparison group
  The group that receives the intervenon
 The group that does not receive the intervenon and is used
as the counterfactual to the intervenon
 The extent to which the results of an evaluaon are supportable, given the eval-
uaon design and the methods used Validity applies to the evaluaon design, analyc
methods, and data collecon Ulmately, valid claims are sound ones There are two main
types of validity:
 The extent to which a study or instrument measures a construct
accurately and is free of alternave explanaons There are two common threats to
internal validity:
 When parcipants (individuals, schools, and so on) leave an evaluaon
before it concludes
 When the treatment group diers from the comparison group in a
meaningful way that is related to the outcomes of interest

 
 
 

 The extent to which an instrument or evaluaon ndings can be
generalized to dierent contexts, such as other populaons or sengs There are three
common threats to external validity:
 When contextual factors, such as me and
place, dier between the sample in the evaluaon and a populaon to which one
wants to generalize
 When external factors, such as an addional program,
might cause the evaluaon to detect a dierent eect than it would if the exter-
nal factors were not present
  When individuals act dierently because they are aware that
they are taking part in an evaluaon
  Programs that have evidence of their eecveness in producing
results and improving outcomes when implemented
 A law that encourages state and local educaon
agencies to use evidence-based programs There are four ESSA ers of evidence (US
Department of Educaon, 2016) These ers fall under the Educaon Department General
Administrave Regulaons Levels of Evidence for research and evaluaon design standards:
  A program is supported by at least one well-implemented random-
ized controlled trial with low arion Arion refers to the number of parcipants
who leave a study before it is completed
  A program is supported by at least one well-implemented ran-
domized controlled trial with high arion or at least one well-implemented quasi-
experimental design
  A program is supported by at least one well-implemented correla-
onal design with stascal control for selecon bias
  A program has a well-specied logic model with one
intended outcome of interest that aligns with a stakeholder need The program is sup-
ported by exisng or ongoing research demonstrang how it is likely to improve the
outcomes idened in the logic model
 The WWC is part of the US Depart-
ment of Educaon’s Instute of Educaon Sciences To provide educators with the informa-
on they need to make evidence-based decisions, the WWC reviews research on educaon
programs, summarizes the ndings of that research, and assigns evidence rangs to individ-
ual studies (What Works Clearinghouse, 2020)


There are three WWC design standards that correspond to the ESSA ers of evidence A
study can be found:
 This er corresponds to strong evi-
dence under ESSA
 This er corresponds to moderate evi-
dence under ESSA
 This er sll provides promising evidence under ESSA

 
 


 All possible parcipants in a program
 Used to collect data from everyone in a populaon
 A subset of an enre populaon that is idened for data collecon
 How well a sample represents the enre populaon
 The extent to which the results of an evaluaon apply to dierent types of
individuals and contexts
 Stascal adjustments to ensure a sample is representave of the enre popula-
on with respect to parcular characteriscs
 The number of parcipants needed in a sample to collect enough data to
answer the evaluaon quesons
 A list of all possible units (such as students enrolled in schools in a parcu-
lar district) that can be sampled
 A sampling technique in which every individual within a populaon has a
chance of being selected for the sample There are three main types of random sampling:
 Individuals in a populaon are selected with equal probabili-
es and without regard to any other characteriscs
   Individuals are rst divided into groups based on known
characteriscs (such as gender or race/ethnicity) Then, separate random samples are
taken from each group
 Individuals are placed into specic groups, and these
groups are randomly selected to be in the sample Individuals cannot be in the sample
if their groups are not selected
  A sampling technique in which only some individuals have a chance
of being selected for the sample There are four main types of nonrandom sampling:
 Individuals meeng a criterion for eligibility (such as being math
teachers) are recruited unl the desired sample size is reached
  Individuals are selected who are readily available and from
whom data can be easily collected

 
 
 

 Individuals are recruited through referrals from other parcipants
 Individuals are selected to ensure that certain characteriscs are
represented in the sample to meet the objecves of the evaluaon
 The point at which the data collected begin to yield no new informaon and
data collecon can be stopped
 The level at which data are collected (for example, student, class-
room, school)
  A range of values for which there is a certain level of condence that
the true value for the populaon lies within it The range of values will be wider or narrower
depending on the desired level of condence Standard pracce is to use a 95 percent con-
dence level, which means there is a 95 percent chance that the range of values contains the
true value for the populaon
 A statement that suggests there will be no dierence between the treat-
ment group and the comparison group involved in an evaluaon
  The probability of rejecng the null hypothesis when a parcular alterna-
ve hypothesis is true
  Data that can take on a full range of possible values, such as student test
scores, years of teaching experience, and schoolwide percentage of students eligible for the
Naonal School Lunch Program
 Data that can take on only two values (yes or no), such as pass or fail scores on
an exam, course compleon, graduaon, or college acceptance
 A measure that indicates how spread out data are within a given
sample

 
 
 
 
 


  Numerically measurable informaon, including survey responses, assess-
ment results, and sample characteriscs such as age, years of experience, and qualicaons
  Informaon that cannot be measured numerically, including interview
responses, focus group responses, and notes from observaons
 The extent to which data accurately and precisely capture the concepts they
are intended to measure
 The extent to which an evaluaon or instrument really measures what it is intended
to measure Validity applies to the evaluaon design, methods, and data collecon There
are two main types of validity:
 The extent to which a study or instrument measures a construct
accurately and is free of alternave explanaons
 The extent to which an instrument or evaluaon ndings can be gen-
eralized to dierent contexts, such as other populaons or sengs
 The extent to which the data source yields consistent results There are three
common types of reliability:
  The extent to which items in a scale or instrument consistently
measure the same topic
 The extent to which the same individual would receive the
same score on repeated administraons of an instrument
  The extent to which mulple raters or observers are consistent
in coding or scoring
 The extent to which data are current and the results of data analysis and inter-
pretaon are available when needed
 The data collected in an evaluaon include sucient details or con-
textual informaon and can therefore be meaningfully interpreted
 The extent to which data are free from manipulaon and entry error
Trustworthiness is oen addressed by training data collectors
 Data are collected from all parcipants in the sample and are sucient to
answer the evaluaon quesons Completeness also relates to the degree of missing data
and the generalizability of the dataset to other contexts


 Reviewing mulple sources of data to look for similaries and dierences
  Establishing the validity of qualitave ndings through key stakeholder and
parcipant review
 A documented history of qualitave data collecon and analysis Careful doc-
umentaon of data collecon procedures, training of data collectors, and notes allows for
ndings to be cross-referenced with the condions under which the data were collected
  A planning tool to ensure that all necessary data are collected to answer
the evaluaon quesons

   
 
 
   


 Directly asking an individual quesons to collect data to answer an evaluaon
queson
 Directly asking a group of parcipants quesons to collect data to answer an
evaluaon queson
 Instrucons for conducng an interview, focus group, or observaon An interview
or focus group protocol should include steps for conducng the interview or focus group,
a script of what to say, and a complete set of quesons An observaon protocol should
include informaon about items to observe, the data collecon approach to use (recording
checklist, observaon guide, or open eld notes), and the type of observaon
 Watching individuals or groups to collect informaon about processes, situa-
ons, interacons, behaviors, physical environments, or characteriscs There are four types
of observaon, all of which can be conducted in person or virtually:
  Conducted in structured and arranged sengs
  Conducted in unstructured and real-life sengs
  Observers make their presence known
  Observers do not make their presence known
 Administering a xed set of quesons to collect data in a short period Surveys can
be an inexpensive way to collect data on the characteriscs of a sample in an evaluaon,
including behaviors, pracces, skills, goals, intenons, aspiraons, and percepons
 Behaviors, pracces, or skills that can be directly seen and measured
Also called a measurable variable These data are collected in a variety of ways (for example,
observaons, surveys, interviews)
 Goals; intenons; aspiraons; or percepons of knowledge, skills,
or behavior that cannot be directly seen and measured but can be inferred from observable
indicators or self-report Also called a latent variable
  A queson that does not include xed responses or scales but allows
respondents to add informaon in their own words
  A queson that includes xed responses such as yes or no, true or
false, mulple choice, mulple selecon, or rang scales

 

 The middle of a rang scale with an odd number of response opons Typically,
respondents can select the midpoint to remain neutral or undecided on a queson
  A queson that asks two quesons but forces respondents to
provide only one answer For example, “Was the professional development culturally and
developmentally appropriate?
  A queson that could lead respondents to answer in a way that does not
represent their actual posion on the topic or issue For example, the wording of a queson
or its response opons could suggest to respondents that a certain answer is correct or
desirable
  A follow-up queson that helps gain more context about a parcular
response or helps parcipants think further about how to respond
 A standardized form, with preset quesons and responses, for observ-
ing specic behaviors or processes
  A form that lists behaviors or processes to observe, with space to record
open-ended data
 A exible way to document observaons in narrave form
 When two response opons in a survey cannot be true at the same
me
 When response opons in a survey include all possible responses to
a queson

 
 


 Collecng, organizing, and cleaning data in a manner that ensures accu-
rate and reliable analysis
 The dierence between an actual data value and the reported data value
 A data value that is posioned an abnormal distance from the expected data range
 The process of examining and interpreng data to answer quesons There
are two broad approaches to data analysis:
  Describing or summarizing a sample Descripve methods can
involve examining counts or percentages; looking at the central tendency of a distribu-
on through means, medians, or modes; and using stascs such as standard deviaon
or interquarle range to look at the spread, or variaon, of a distribuon
  Drawing conclusions about a populaon from a sample Infer-
enal methods can include techniques such as t-tests, analysis of variance (ANOVA),
correlaon, and regression
 The average response across a sample
 The value at the midpoint of a distribuon
 The most common response in a distribuon
 A measure of how spread out data points are that describes how far the
data are from the mean
 The maximum and minimum observed values for a given variable
 One of four even segments that divide up the range of values in a dataset
 The spread of values between the 25th percenle and the 75th
percenle
t A comparison of two means or standard deviaons to determine whether they dier
from each other
 A comparison of three or more means that determines
whether there are stascally signicant dierences among them
 Analysis that generates correlaon coecients that indicate how
dierences in one variable correspond to dierences in another A posive correlaon

 
 

coecient indicates that the two variables either increase or decrease together A negave
correlaon coecient indicates that, as one variable increases, the other decreases
 A family of stascal procedures that esmate relaonships between
variables
 Analysis that can show the relaonship between
two variables
 Analysis that can control for other factors by including
addional variables
 A variable that could be predicted or caused by one or more other
variables
  A variable that has an inuence on or associaon with the dependent
variable
 A variable that has a relaonship to the dependent variable that should be con-
sidered but that is not directly related to a program Examples of covariates are student
race/ethnicity, gender, socioeconomic status, and prior achievement
 A variable that could result in misleading interpretaons of a relaonship
between the independent and dependent variable For example, if all the teachers who are
implemenng a new math intervenon program have a masters degree in math while the
teachers who are not implemenng the program have only a bachelor’s degree, the degree
aainment of the intervenon teachers is a confound Teachers’ addional educaon
experience, rather than the math intervenon, could be the reason for changes in student
achievement

   
 
 
 
 
 
 
 
 


 Sharing informaon about an evaluaon and its ndings with a wide
audience
  Strategically planning disseminaon acvies to use me and other
resources eciently and to communicate eecvely
 The group of people who need or want to hear the informaon that will be
disseminated
 The informaon that the audience needs to know about an evaluaon and that
the evaluators want to share
 The means used to disseminate the informaon to the audience There are many
disseminaon approaches:
 An online forum for sharing regular updates about a program and the evaluaon
process
 A visual tool for organizing and sharing summaries of large amounts
of data, especially quantave data
  A gathering of interested stakeholders at which an evaluator pres-
ents the ndings through mulmedia and visual displays of the data
 A write-up about an evaluaon and its ndings to be shared with
media outlets
  A formal, highly organized document describing the methods, mea-
sures, and ndings of an evaluaon
  A condensed version of an evaluaon report that provides a brief
overview of the methods and ndings
 A short one- to two-paragraph piece that briey describes what
is happening and what was found in an evaluaon
 Digital tools to quickly create and share informaon about an evaluaon
with a variety of audiences
 A visual medium and way to reach large numbers of people, oen at lile or
no cost

 
 
 
 

 A one- or two-page document that graphically represents data and nd-
ings to tell a story
 A way to share informaon quickly, clearly, and in an engaging way
 A brief recording for sharing informaon on a topic through a discussion
format
 When the audience needs to know the informaon
 Using clear communicaon and wring so that it is easy for the audience to
understand and use the ndings of an evaluaon
 Ensuring that disseminaon products are available to all individuals, including
people with disabilies, by meeng the requirements for Secon 508 compliance
  Using graphical representaons so that data are easier to understand
 A narrave descripon of a gure, illustraon, or graphic for readers who
might not be able to engage with the content in a visual form


US Department of Educaon (2016) Non-regulatory guidance: Using evidence to strengthen educa-
on investments. hps://www2edgov/policy/elsec/leg/essa/guidanceuseseinvestmentpdf
What Works Clearinghouse (2020) Standards handbook (Version 41) US Department of Educaon,
Instute of Educaon Sciences, Naonal Center for Educaon Evaluaon and Regional Assistance
hps://iesedgov/ncee/wwc/handbooks

   
REL 2022–112
October 2021
This resource was prepared for the Instute of Educaon Sciences (IES) under Contract
ED-IES-17-C-0005 by the Regional Educaonal Laboratory Central administered by Marzano
Research The content of the resource does not necessarily reect the views or policies of IES
or the US Department of Educaon, nor does menon of trade names, commercial prod-
ucts, or organizaons imply endorsement by the US Government
This REL resource is in the public domain While permission to reprint this publicaon is not
necessary, it should be cited as:
Stewart, J, Joyce, J, Haines, M, Yanoski, D, Gagnon, D, Luke, K, Rhoads, C, & Germeroth, C
(2021) Program Evaluaon Toolkit: Quick Start Guide (REL 2022–112) US Department of Edu-
caon, Instute of Educaon Sciences, Naonal Center for Educaon Evaluaon and Regional
Assistance, Regional Educaonal Laboratory Central hp://iesedgov/ncee/edlabs
This resource is available on the Regional Educaonal Laboratory website at hp://iesedgov/
ncee/edlabs

The Program Evaluaon Toolkit would not have been possible without the support and con-
tribuons of Trudy Cherasaro, Mike Siebersma, Charles Harding, Abby Laib, Joseph Boven,
David Alexandro, and Nazanin Mohajeri-Nelson and her team at the Colorado Department of
Educaon