Formative vs Summative Assessment: A Global Comparison

Formative assessment and summative assessment answer two different questions, and strong education systems rarely treat them as substitutes. One asks what a learner understands while learning is still in motion; the other records what the learner can show at a defined point in time. That distinction sounds simple, yet it shapes curriculum pacing, teacher workload, grading fairness, national examinations, school accountability, university entry, and the quality of daily classroom feedback. Around the world, the policy direction is now clearer: systems still need trusted end-point judgments, but they also need faster classroom evidence because delayed information cannot repair unfinished learning. ^[a]^[b]^[c]

A useful rule holds across most systems: formative assessment improves the next lesson, the next draft, or the next explanation; summative assessment supports grading, certification, progression, comparison, or accountability. When schools blur these purposes, teachers often overload a single task with too many jobs.

What Separates Formative and Summative Assessment

This table shows how the two assessment models differ in purpose, timing, evidence use, and policy role.
Dimension	Formative Assessment	Summative Assessment	What Leaders Usually Do With the Result
Primary purpose	Improve learning while it is happening	Judge learning after a teaching period	Adjust teaching, pacing, intervention, grading, progression, or certification
Timing	During lessons, units, projects, or practice cycles	At the end of a unit, term, course, stage, or programme	Use formative data for response; use summative data for records and decisions
Typical stakes	Usually low stakes	Often medium or high stakes	Protect classroom experimentation while keeping trusted end-point judgments
Feedback speed	Immediate or near-immediate	Often delayed	Fast feedback supports correction before misconceptions settle
Evidence type	Questions, drafts, short quizzes, exit tickets, observations, oral checks, peer and self-review	Exams, final essays, end-of-unit tests, moderated coursework, capstones, national tests	Combine classroom evidence with externally comparable measures
Main strength	Instructional value	Comparability and certification value	Build a balanced system rather than choosing one only
Main weakness	Can vary in quality and consistency between classrooms	Can arrive too late to improve the learning that was measured	Use moderation, rubrics, and clearer assessment criteria

OECD describes the relationship directly: summative assessment records or certifies learning at key stages, while daily assessment interactions between teachers and students are the part most likely to improve outcomes over time. UNESCO makes the same point at system level, where assessment is not only about scores but about using evidence to improve quality and equity. ^[b]^[c]

Why This Difference Matters More in 2026

Global Learning Pressure

The global education picture gives this comparison real weight. UNESCO’s 2025 SDG 4 Scorecard reports 272 million out-of-school children, adolescents, and youth, which is 21 million more than earlier estimates. At the same time, World Bank learning-poverty updates continue to warn that in low- and middle-income countries, more than half of children still cannot read and understand a simple age-appropriate text by the end of primary school. When systems carry this much unfinished learning, slow assessment is not enough. ^[d]^[e]

PISA 2022 sharpened that warning. Across OECD countries, 31% of students performed below baseline Level 2 in mathematics, 26% fell below Level 2 in reading, and 24% fell below Level 2 in science. Mean mathematics performance across OECD countries also fell by a record 15 points from 2018 to 2022, while reading fell 10 points. Those numbers do not argue for abandoning summative testing; they argue for using stronger formative signals long before the final score arrives. ^[f]

TIMSS 2023 adds a second layer. More than 650,000 students took part, and the study completed its move to a fully digital administration. Large-scale assessment is therefore not standing still; it is becoming more interactive, more data-rich, and more capable of showing item-level patterns. Yet even a better large-scale test still acts mainly as a rear-view mirror. Formative assessment, by contrast, works like a dashboard light: it helps teachers act before the engine starts losing power. ^[g]

What Formative Assessment Does Best

Fast Evidence Changes Teaching

Formative assessment earns its value from timing and actionability. OECD’s 2025 work on teaching quality defines it as the ongoing process through which teachers set learning goals, diagnose student understanding, provide feedback, and adapt to student thinking. That matters because a score matters only when it changes the next teaching move. If the evidence cannot change what happens next, it behaves more like record-keeping than learning support. ^[h]

It reveals misconceptions early. A two-minute exit ticket can show whether a class misunderstood a concept before the next lesson locks the error in place.
It improves feedback quality. The stronger versions do not merely mark right and wrong; they explain the gap between current work and the target.
It helps pacing. Teachers can slow down, reteach, regroup, or extend based on evidence rather than intuition alone.
It supports self-regulation. Students who see criteria, feedback, and revision cycles learn how to judge their own work with more accuracy.
It lowers the cost of failure. Because the stakes are usually low, students can revise without carrying the weight of a final judgment.

Evidence on feedback reinforces that picture. The Education Endowment Foundation’s toolkit rates feedback as a high-impact, very-low-cost approach, with an average impact of +6 months of progress across 155 studies. The same review notes that verbal feedback shows slightly higher average effects than the overall mean and that embedding formative assessment can lay the foundation for better feedback. That does not mean every quick quiz helps. It means feedback tied to clear goals and next steps tends to help. ^[i]

Another strength is breadth. Formative evidence can capture things that final examinations often sample only partially: scientific reasoning during discussion, revision quality in writing, oral language growth, method choice in mathematics, and how students explain why an answer is defensible. In subjects that depend on process as much as product, that matters a great deal. A polished final answer can hide shaky reasoning; a well-run formative cycle tends to expose it.

What Summative Assessment Still Does Better

Stable Judgments Still Matter

Summative assessment remains essential because education systems need judgments that are stable enough to support progression, certification, and public trust. Universities need selection tools. Employers and professional bodies need signals of attainment. Ministries need system-level results they can compare across schools, regions, or years. Parents also want clarity on whether a learner has met an expected standard. None of those jobs disappear because formative practice improves. ^[a]^[b]

Its biggest advantage is comparability. A well-designed end-of-course test or moderated performance task can apply common criteria across large groups. That makes summative assessment useful for reporting and fairness at scale, even if it captures only a slice of what students know. This is why the debate is rarely “formative or summative.” The real question is how much weight a system assigns to each and how well it links one to the other.

Certification: It records whether a learner met a stage standard.
Selection: It supports entry into higher levels of schooling or specific programmes.
Accountability: It gives schools and ministries a shared reporting base.
Moderation: It allows external checking of internal judgments.
Signal value: It tells families and institutions what has been achieved at a given point.

International Baccalaureate assessment shows why this role survives. In the Diploma Programme, most courses still rely heavily on written examinations because the IB states that these exams offer high levels of objectivity and reliability, while the broader programme also uses internally and externally assessed components. That mixed model reflects a global reality: systems want richer evidence, but they do not want to lose stable external judgments. ^[jj]

Where Each Model Starts to Struggle

Formative assessment weakens when teachers collect too much evidence without using it well. Workload rises fast when every task receives detailed written comments. Feedback loses value when it arrives after the class has moved on, when criteria stay vague, or when students cannot act on the advice. Quality also varies between classrooms. One teacher may use questioning, exemplars, and short reteach cycles with precision; another may run the same activities as routine compliance.

Summative assessment weakens for the opposite reason. It can become too narrow, too infrequent, and too delayed. A high-stakes exam may judge performance accurately on that day and still miss wider capabilities such as revision habits, collaboration, oral reasoning, or growth over time. It can also intensify curriculum narrowing when schools start teaching the test format more than the subject itself. That risk becomes sharper when a single exam carries too much weight.

The fairness trade-off is also real. Teacher-based assessment may reflect authentic learning more closely, but it can drift if criteria are interpreted differently across schools. External assessment may improve comparability, yet it can flatten complex learning into a smaller set of tasks. This is why moderation, common rubrics, anchor tasks, and clearer national criteria matter so much. Finland’s final assessment reforms, for example, were introduced specifically to improve the equity of assessment and the comparability of grades. ^[k]

How Different Education Systems Combine the Two

Five System Patterns

This table compares how selected education systems and programmes mix classroom assessment with end-point judgment.
System or Programme	Formative Side	Summative Side	Why the Model Matters
England	Day-to-day formative assessment guides ongoing teaching	In-school summative assessment and statutory end-of-key-stage assessment	Official guidance names the layers separately, which reduces confusion about purpose
Finland	Continuous teacher-led assessment during learning	National final assessment criteria support grade comparability	Shows how a low-testing culture can still use clear final criteria for fairness
Singapore	More space for classroom feedback and regular school-based evidence	Weighted assessments remain, but mid-year examinations were removed in more levels	Illustrates a shift away from overemphasis on testing without removing accountability
Australia	National investment in online formative assessment tools and progressions	School and system-level benchmark and reporting structures remain	Shows how digital formative systems can support teaching at scale
IB Diploma Programme	Coursework and internal assessment capture part of the learning	Written examinations and external marking retain strong certification value	Shows why international programmes still rely on mixed evidence models

England is unusually explicit. Government guidance for Key Stage 2 says schools use three forms of teacher assessment: day-to-day formative assessment, in-school summative assessment, and end-of-stage statutory summative assessment. That wording matters because it separates the jobs cleanly. A classroom quiz should not be mistaken for a national comparison tool, and a statutory test should not be expected to carry the full weight of daily instruction. ^[j]

Finland illustrates another path. The system is known for strong teacher agency and less test pressure during compulsory schooling, yet the Finnish National Agency for Education still introduced more precise final assessment criteria to improve grade equity and comparability. That move shows something important: trusting teachers does not remove the need for clearer end-point criteria. It changes where the system places its trust and how it supports consistency. ^[k]

Singapore has adjusted the balance in a different way. Ministry of Education statements on the removal of mid-year examinations explain that the aim is to reduce the overemphasis on testing and grades, free time for richer learning, and avoid simply replacing removed exams with more school-based testing. In 2023, MOE also stated that junior colleges and Millennia Institute should administer no more than one weighted assessment per subject per term as cohorts move away from mid-year examinations. That is not a rejection of summative assessment; it is a redistribution of assessment load. ^[l]

Australia has treated formative practice as a system design issue rather than only a classroom habit. The federal Online Formative Assessment Initiative, linked to the National School Reform Agreement, was built to support learning progressions, online resources, and professional learning for teachers. That choice reflects a wider global pattern: when ministries want formative practice to become routine, they invest in tools, criteria, and teacher learning rather than leaving the work to isolated enthusiasts. ^[m]

What International Benchmarking Adds to the Debate

Large-scale international assessments do not tell teachers how to run tomorrow morning’s lesson, but they do show whether national assessment systems are producing the learning levels that public policy expects. PISA 2022 showed that only 69% of students, on average across OECD countries, reached at least baseline proficiency in mathematics, 74% reached that level in reading, and 76% in science. It also showed that 9% reached the top proficiency levels in mathematics, while only 7% did so in reading and science. ^[f]

The same data show why country examples matter. In PISA 2022, OECD reports that Singapore led the mathematics league table, while systems such as Singapore, Japan, Estonia, Macao (China), Hong Kong (China), Chinese Taipei, and Korea were close to universal basic proficiency in one or more tested domains. Those outcomes do not prove that one assessment model alone caused the result. They do show that high-performing systems usually combine clear standards, teacher feedback, and credible end-point checks rather than relying on only one of those elements. ^[f]

TIMSS 2023 matters here for another reason. Because it now runs fully digitally, it points toward a future where summative testing can capture more complex item types and produce faster diagnostic reporting. Yet the same development narrows the gap between the two models only slightly. Digital summative tests can become more informative, but they still happen at chosen intervals. Formative assessment remains the part of the system that can turn evidence into immediate instructional correction. ^[g]

Reliability, Validity, and Equity Across the Two Models

Across countries, the hardest technical issue is not whether assessment should exist. It is which kind of evidence deserves weight for which decision. Summative assessment usually performs better when the system needs wide comparability, common scaling, or public reporting. Formative assessment often performs better when the goal is to observe reasoning, diagnose errors, and support next-step teaching. One is stronger for decision consistency at scale; the other is stronger for instructional precision close to the learner. ^[a]^[k]

Equity depends on both. If a system relies only on teacher judgment without moderation, students may face uneven expectations between schools. If it relies only on high-stakes examinations, students who need more time, feedback, or multiple ways to show learning may be reduced to one-time performance. Better systems therefore use a mixed structure: teacher evidence during learning, clear criteria, and some external or moderated checkpoint. That blend does not eliminate inequity, but it limits two common failures at once: hidden inconsistency and overreliance on single-shot testing.

Digital Tools and AI Are Changing Both Sides

Process Evidence Is Gaining Weight

The newest shift is not only digital delivery. It is the redesign of what counts as evidence when students can use generative AI. UNESCO’s student AI competencies publication says education systems need explicit AI learning goals, and its teacher AI competencies publication defines 15 competencies across five dimensions for teacher capability in the AI age. That matters for assessment because schools cannot evaluate work fairly unless teachers and students know what counts as acceptable assistance, what must be verified, and what the learner must still show independently. ^[n]^[o]

UK government guidance updated in August 2025 makes the operational problem plain. Schools and colleges may need to review homework and other unsupervised study because of generative AI, define when its use is acceptable, and engage families about those rules. Once that issue enters policy, assessment design cannot stay the same. Tasks built only around polished final output are easier to outsource to a machine; tasks that require draft trails, oral explanation, source checking, and reasoned revision protect the value of both formative and summative evidence. ^[p]

A recent 2026 comparative review on curriculum reform reaches a similar point from another angle: when AI tools can draft text, solve problems, or generate plausible mistakes, assessment has to capture thinking, process, and verification, not only the final product. That trend matters globally because it pushes both assessment types toward stronger evidence of judgment. ^[q]

What the Global Comparison Shows

Formative assessment is strongest when the main goal is improvement during learning.
Summative assessment is strongest when the system needs trusted reporting, certification, or comparison.
High-performing or high-capacity systems usually do both, but they separate the purposes clearly.
Countries are reducing overloaded exam calendars, yet they are not removing final judgments altogether.
Equity improves when teacher judgment is supported by clear criteria, moderation, and common expectations.
Digital delivery makes assessment faster and more flexible, but it does not erase the difference between feedback for learning and judgment of learning.
AI is pushing schools to reward process evidence, verification, and explanation more explicitly than before.

Viewed globally, the better question is not which model should win. It is which evidence belongs to which decision. Formative assessment should carry the daily work of diagnosis, feedback, and course correction. Summative assessment should carry the jobs that require stable judgment, public reporting, and certification. When a system gives each model the job it can actually do well, students get faster support, teachers get more usable information, and final results become easier to trust.

Sources

[a] OECD publication explaining why education systems need stronger links between formative and summative assessment. ↩
[b] OECD overview stating that summative assessment records or certifies learning, while daily assessment interactions drive lasting improvement. ↩
[c] UNESCO page on learning assessment as a tool for measuring and improving education quality and equity. ↩
[d] World Bank learning-poverty entry point linking the latest country briefs and current database updates. ↩
[e] UNESCO Institute for Statistics report with the 2025 out-of-school estimate of 272 million. ↩
[f] OECD PISA 2022 chapter with baseline proficiency rates, low-performer shares, and top-performer data in mathematics, reading, and science. ↩
[g] TIMSS release note confirming the 2023 results, digital administration, and participation of more than 650,000 students. ↩
[h] OECD 2025 publication defining formative assessment and feedback as ongoing diagnosis, response, and adaptation during teaching. ↩
[i] Education Endowment Foundation evidence summary reporting average impact, evidence strength, and study count for feedback-related practice. ↩
[j] England’s official guidance distinguishing day-to-day formative, in-school summative, and statutory end-of-key-stage assessment. ↩
[k] Finnish National Agency for Education page on final assessment criteria introduced to improve equity and grade comparability. ↩
[l] Singapore MOE press release on reducing exam load and limiting weighted assessments as part of a broader shift in school-based assessment. ↩
[m] Australian Government fact sheet on the Online Formative Assessment Initiative under the National School Reform Agreement. ↩
[n] UNESCO article outlining student AI competencies that schools may integrate into curriculum and assessment. ↩
[o] UNESCO article defining teacher AI competencies relevant to classroom assessment design and policy. ↩
[p] UK government guidance on acceptable use, homework policies, and operational issues linked to generative AI in education. ↩
[q] Education by Country review connecting AI-era curriculum change to assessment designs that reward process, verification, and explanation. ↩
[jj] IB explanation of how the Diploma Programme combines internal and external assessment while keeping written exams central for reliability and objectivity. ↩