Choose Better Programs: Using Evidence to Foster Better Learning
This article is part of an explanatory series about research into how people learn. By the end, you’ll understand a key concept in education and how to apply it to supercharge your own learning (or help others learn)!
Let’s talk about efficacy in edtech programs — what it means, how it’s tested, and how to find programs that really deliver results.
First things first: products should do what they claim to do, right? For edtech, this means that a highly effective learning program produces measurable, meaningful learning outcomes for all students. Most edtech companies make this claim about their products, and that’s great! It’s the right goal. We need to measure learning to know if it’s happening and happening well. Plus, learning should be meaningful — not just useful, but connecting to a student’s life in ways that interest and inspire them. The big question is: how can we tell if a program will actually deliver on its promises?
Efficacy Research
To figure out if a learning program works as intended, we conduct research. A helpful way to think about efficacy research is the Evidence for ESSA framework, developed by the Center for Research and Reform in Education (CRRE) at Johns Hopkins University School of Education. This framework helps educators and the public evaluate how rigorous a study’s evidence is, and therefore, how reliable its results are likely to be.
Here’s an important point: Evidence for ESSA evaluates research, not programs. It can’t tell you for certain whether a program will work, but it can tell you how much you can trust evidence that the program has worked before.
The Evidence for ESSA framework has four tiers:
1. Strong Evidence of Impact
2. Moderate Evidence of Impact
3. Promising Evidence of Impact
4. Promising Rationale for Expecting Impact
Let’s break down each tier.
Tier 1: Strong Evidence of Impact
This is the “gold standard” of research. To qualify as a Tier 1 study, the research must have:
· At least 350 participants
· Participants from more than one site (e.g., school)
· A well-designed and well-implemented randomized-controlled trial (RCT)
· A statistically significant positive result
In an RCT:
· Participants are randomly sorted into treatment and control groups
· Everyone takes a reliable assessment of their knowledge
· The treatment group uses the product being evaluated, while the control group gets usual classroom instruction
· After the study period, everyone takes the assessment again
· Researchers then compare how much the treatment group improved compared to the control group
Essentially, a Tier 1 study takes every precaution to ensure that any positive effects are real and caused by the program being tested, not by chance or other factors.
Tier 2: Moderate Evidence of Impact
Tier 2 studies look similar to Tier 1 studies but with one key difference: some element of the traditional RCT design isn’t fully implemented.
A common example is a quasi-experimental study. Instead of random sorting, researchers use statistical methods to create treatment and control groups that are as similar as possible. They might consider factors like age, location, school, ethnicity, socioeconomic status, home language, and gender.
This method is often used for after-the-fact research. For example, if a group of students already used a program and showed improvement, researchers might create an equivalent control group from students who didn’t use the program but took the same assessments. If the positive effect remains, it’s likely the program caused the improvement.
Other than this change in how the groups are created, a quasi-experimental study works just like an RCT at Tier 1:
Another way that a study might be deemed Tier 2 is if it was designed to be a true RCT, but some element didn’t work out. For example, if there was so much attrition of the study population (people dropped out) that by the end, they didn’t have the required number of participants.
Tier 2 evidence is still valuable! It’s just not quite as rigorous as Tier 1. If a program has Tier 2 evidence, there’s still a good chance it works as advertised.
Tier 3: Promising Evidence of Impact
Tier 3 studies typically don’t have a control group. Instead, they might compare students who used the program to expected outcomes or to students from previous years.
Here’s a real-world example: in 2020, pre-k students in Harlingen, TX used My Math Academy as a large part of their remote (and later hybrid) math instruction. At the end of the school year, they scored significantly higher on math assessments than Harlingen’s pre-K students had in the past. They were the only cohort in Harlingen to score this way — the other grade levels all scored in ranges typical for the district. When they went to kindergarten the following fall, they took a start-of-year assessment and still scored higher than both their peers who hadn’t had MMA, and previous years’ incoming kindergarteners. As far as anyone in Harlingen could identify, the only major difference in this student cohort was their use of My Math Academy.
While this result is promising, we can’t be 100% certain that My Math Academy caused the improvement without a control group. Maybe the Pre-K teachers were really stepping it up because of the pandemic. Maybe parents were helping their kids a lot more than usual, because they were at home. These seem unlikely, given that this effect was only seen in the Pre-K students — it would be strange if only the Pre-K teachers and parents were leaning in more, right? But, in the absence of a control, we cannot be fully certain. That’s why this kind of study is considered Tier 3.
Tier 4: Promising Rationale for Expecting Impact
At Tier 4, no actual research has been done on the program being evaluated. Often, this is because the program itself is new and not widely used yet, so there hasn’t been an opportunity to do a large and robust study.
Instead of looking at research on the program itself, we evaluate the research that the program’s developers used to design their product.
For example, a program might be trying to keep students within their zone of proximal development (unfamiliar? Check out my deep dive on the ZPD to learn more). To do this, it might:
· Assess what students already know and what they’re ready to learn
· Teach concepts students are ready for
· Adjust difficulty based on student performance
· Continuously assess and adapt
This approach shows a good understanding of the ZPD concept and should work in theory. But without concrete evidence, we can only say it has a promising rationale — hence, Tier 4.
Finding Efficacious Programs
Now that you understand how to evaluate evidence, where do you find it?
The first and easiest way is: ask edtech companies for their efficacy research. If they have it, they will be happy to share it, and if they don’t, they’ll be able to tell you why (likely they’re new). Some also publish their research directly on their websites for you to access on your own.
In addition to looking for studies that you will read and evaluate for yourself, you can look for signs that studies have already been reviewed and rated. For example, you might see badges like this listed on a program’s website:
Remember that even though these icons might be positioned as if to say a product itself is rated, what they really are referring to is the rigor of a particular study about that program.
There are also online tools for finding products that have independently reviewed research.
The Evidence for ESSA website is one of the largest, and of course, is directly aligned with the Evidence for ESSA four-tier framework.
Digital Promise also does product certifications and takes efficacy research into account as part of their rubrics.
What Works Clearinghouse is one of the largest and most well-known organizations that review edtech products; their reviewing methods also take into account efficacy research and are extremely thorough and rigorous. However, WWC mostly only review core curricula. If what you are interested in are supplemental programs, you’ll need to look at resources like Evidence for ESSA or Digital Promise.
Conclusion: Making Informed Decisions
Understanding efficacy research is crucial for making informed decisions about edtech programs. As we’ve explored, not all evidence is created equal, and knowing the difference between strong, moderate, and promising evidence can help you choose programs that are more likely to deliver real results for your students.
Remember these key takeaways:
1. Prioritize Strong Evidence: Look for programs with Tier 1 or Tier 2 evidence when possible. These offer the strongest assurance that a program works as intended. However, don’t completely dismiss Tier 3 or Tier 4 evidence, especially for newer or more innovative programs.
2. Ask Questions and Dig Deeper: Don’t just accept efficacy claims at face value. Ask about who conducted the research, how many students were involved, whether there were control groups, and how the study relates to your specific context. The more you understand about a study, the better you can evaluate its relevance.
3. Consider the Bigger Picture: While efficacy is crucial, it’s just one piece of the puzzle. Balance it with other factors like ease of implementation, cost, and alignment with your curriculum and teaching philosophy. Also, stay informed about new research, as the edtech field is constantly evolving.
Remember, the goal isn’t just to use technology in education — it’s to use technology that demonstrably improves education. With a solid understanding of efficacy research, you’re well-equipped to make impactful choices on behalf of the students you serve.
This article was edited in collaboration with Claude, an AI language model developed by Anthropic.