Skip to content

Literacy and Numeracy Benchmarks: How Countries Measure Skills

Countries do not measure literacy and numeracy with one universal ruler. They use layered systems: early-grade checks for foundational skills, curriculum-based national assessments, international sample surveys, household modules, and adult skill surveys. That matters because a child who can decode a short story, a 15-year-old who can apply mathematics to a currency problem, and an adult who can interpret a graph are not showing the same thing. A benchmark is the ruler, not the full picture of learning. When readers compare countries, the most useful question is not only “Who scored higher?” but also “What skill was measured, at what age, with which threshold, and for what policy purpose?”[a][b][f]

What sits behind a benchmark number?

  • Construct: decoding, reading comprehension, applied mathematics, or adult functional skill.
  • Population: Grade 2/3 pupils, end of primary students, 15-year-olds, or adults aged 16–65.
  • Reporting unit: scale score, proficiency band, percentage at or above a cut score, or literacy rate.
  • Purpose: classroom diagnosis, national accountability, cross-country comparison, or SDG monitoring.

Why Benchmarks Matter in Skill Measurement

Benchmarks turn raw test results into policy language. A scale score by itself tells specialists how far a learner sits from the center of a distribution. A benchmark tells ministries, school leaders, and the public whether learners are meeting a stated minimum, moving beyond it, or still below it. That is why global monitoring under SDG 4 relies on a minimum proficiency level, why PIRLS and TIMSS report cumulative international benchmarks, why PISA treats Level 2 as a baseline threshold, and why adult surveys separate low proficiency from top performance.[m][e][d][b][g]

The policy value is practical. Governments need to know whether early-grade pupils can read with understanding, whether students at the end of compulsory schooling can use mathematics in real situations, and whether adults still have the literacy and numeracy needed for work, training, and daily life. A country can raise enrolment and still post weak learning results. It can also show a high adult literacy rate while many adults remain at low tested proficiency on a functional skills survey. Those are not contradictions. They are different measurement lenses aimed at different questions.[i][f][g]

What Literacy and Numeracy Mean in Practice

Foundational Skills in Early Grades

In the early grades, countries often define foundational learning very concretely. UNICEF’s MICS module for Grade 2/3-level reading counts a child as having foundational reading skills only if the child clears all three parts of the reading task: word recognition, literal questions, and inferential questions. For numeracy, the child must clear all four tasks: number reading, number discrimination, addition, and pattern recognition. This is a strict mastery model. It is easy to explain, easy to repeat in household surveys, and useful when the goal is to identify whether children have crossed the first clear skill threshold.[l]

This early-grade lens matters because it catches learning problems long before secondary examinations do. In the 35 countries and territories with MICS foundational learning data, the median share of children with foundational reading skills is 41%, while the median share with foundational numeracy skills is only 25%. That gap is one reason many systems now place numeracy beside reading rather than treating mathematics as a later-stage subject.[l]

School-Age Applied Skills

By the end of primary and lower secondary education, the meaning of literacy and numeracy broadens. Reading no longer means only decoding and short-text recall. It includes interpretation, integration of information, and, in some systems, evaluation of evidence. Numeracy moves beyond simple operations toward representation, reasoning, and the use of mathematics in everyday settings. That shift is visible in PIRLS reading benchmarks, TIMSS mathematics benchmarks, and PISA’s proficiency levels for 15-year-olds.[e][d][b]

Adult Functional Skills

For adults, the construct shifts again. Literacy becomes the ability to understand and use written information in real contexts. Numeracy becomes the ability to work with quantities, rates, proportions, tables, charts, and statistical claims. This is why adult skill surveys can produce results that look harsher than official literacy rates. A simple literacy-rate question asks whether a person can read and write a short, simple statement. A functional skills test asks whether that person can handle tasks that mirror modern work and civic life.[f][g][j]

The Main Ways Countries Report Results

This table shows how the main international and household systems translate skill data into country-level benchmark results.
SystemWho Is MeasuredHow Results Are ReportedWhat the Benchmark MeansIllustrative Current Signal
SDG 4.1.1 / MPLGrades 2/3, end of primary, end of lower secondaryPercentage reaching the minimum proficiency levelBasic knowledge and skill at a defined stage of schoolingGlobally, 58% reach the minimum in reading and 44% in mathematics at the end of primary; when the full age cohort is considered, the shares fall to 51% and 39%.[a]
PIRLS 2021Grade 4 readingScale scores and cumulative benchmark shares at 400, 475, 550, 625Progression from low to advanced reading comprehensionPIRLS 2021 covered 57 countries and 8 benchmarking entities.[e]
TIMSS 2023Grade 4 and Grade 8 mathematicsScale scores and cumulative benchmark shares at 400, 475, 550, 625Progression from basic mathematical knowledge to advanced reasoning91% of Grade 4 students and 81% of Grade 8 students in participating systems reached at least the Low International Benchmark in mathematics.[d]
PISA 202215-year-oldsProficiency levels, with Level 2 used as a baselineWhether students can apply reading and mathematics in real situationsAcross OECD countries, 69% reached Level 2 or above in mathematics; 31% remained below it.[b]
UNICEF MICS FLNChildren aged 7–14, aligned to Grade 2/3 skillsTask mastery in reading and numeracy modulesWhether children have the basic early-grade skills they were expected to acquireAcross 35 countries and territories with data, the median share is 41% for foundational reading and 25% for foundational numeracy.[l]
PIAAC 2023Adults aged 16–65Proficiency levels from low to high performanceFunctional literacy and numeracy for work and everyday lifeAcross participating OECD economies, 26% of adults are at low literacy proficiency and 25% at low numeracy proficiency.[f][g]
LaNA 2023 Linking StudyEnd of primary in selected systemsBasic Benchmark at 325, then 400, 475, 550, 625Skill reporting below the old low benchmarkThe median share reaching the new Basic Benchmark was 55% in reading and 70% in mathematics across participating LaNA systems.[p]

How International Benchmarking Systems Work

SDG 4 Minimum Proficiency Levels

The global SDG system does not require every country to sit one single test. Instead, it asks whether a country can identify a shared minimum proficiency level through national, regional, or international assessments. UNESCO describes the minimum proficiency level as the benchmark of basic knowledge in a domain measured through learning assessments. The practical logic is simple: countries may keep their own assessment systems, but they need a credible way to align local results to shared proficiency descriptors so that the percentage reaching the minimum can be reported with some cross-country meaning.[m][n]

This is one of the least understood parts of global education data. Many readers assume SDG reporting means a global standardized exam identical for every country. It does not. UNESCO’s learning data work supports linking, alignment tools, and targeted assessments such as AMPL so that countries can report against a shared threshold without abandoning local systems.[n]

PIRLS and TIMSS Benchmark Bands

PIRLS and TIMSS use one of the clearest benchmark structures in international assessment. Both report four cumulative international benchmarks: Low (400), Intermediate (475), High (550), and Advanced (625). “Cumulative” matters. A student who reaches High also counts as having reached Intermediate and Low. This makes the results easy to read as a skill ladder rather than a pass-fail line.[e][d][p]

In literacy, PIRLS benchmark descriptions move from more straightforward comprehension of shorter texts toward interpretation, integration, and evaluation in more difficult texts. In numeracy, TIMSS benchmark descriptions move from basic number knowledge and simple representations toward more complex reasoning and justification. This ladder format is useful for ministers because it shows not only who crossed a floor, but how far the upper part of the distribution also extends.[e][p]

PISA Baseline Proficiency and Top Performance

PISA takes a different route. It reports a scale, but the headline number in public debate is often the share of students at or above Level 2, treated as a baseline level of proficiency. In PISA 2022, 31% of students across OECD countries were below Level 2 in mathematics, which means 69% were at or above the baseline. In reading, 26% were below Level 2. That is why PISA debates often separate low performers from top performers rather than using a single pass mark.[b]

PISA also captures recent stress on education systems. Between 2018 and 2022, the OECD average dropped by almost 15 score points in mathematics and about 10 points in reading. OECD described the mathematics fall as larger than any previous consecutive change. For benchmark interpretation, this matters because a shift of that size moves many students across the Level 2 threshold, not just within the same band.[c]

PIAAC and Adult Skill Measurement

Adult measurement brings another reporting style. PIAAC focuses on what adults can do with texts and numbers in daily life and work. Across OECD countries in Cycle 2, 26% of adults have low literacy proficiency and 25% have low numeracy proficiency. At the upper end, 12% score at Levels 4 or 5 in literacy and 14% at Levels 4 or 5 in numeracy. The spread across countries is wide: the share of adults with low literacy proficiency ranges from 10% in Japan to 53% in Chile.[f][g]

The adult story also has a time dimension that many school-focused articles leave out. OECD’s 2024 release on the second Survey of Adult Skills found that literacy and numeracy have largely declined or stagnated in most participating OECD countries over the past decade, with the lowest-performing adults often slipping the most. In other words, country averages are important, but distributional change matters just as much.[h]

Household and Below-Floor Measures

A weak point in older international reporting was what happened below the old low benchmark. If a child did not reach 400 on a PIRLS or TIMSS-linked scale, the reporting often said little about how close or far that child was from the floor. The LaNA 2023 Linking Study addresses that problem by adding a new Basic Benchmark at 325. In participating LaNA systems, the median share reaching the Basic Benchmark was 70% in mathematics and 55% in reading, while only 44% in mathematics and 25% in reading reached the old Low Benchmark at 400.[p]

This is more than a technical refinement. It changes how countries can describe early learning. Instead of grouping all sub-400 learners into one dark box, systems can now report emerging skills with more precision. That is especially useful in lower-income settings where many pupils are still climbing toward the first internationally recognized threshold.[p]

What the Latest Numbers Show

Global School-Age Patterns

The current global picture is mixed. UNESCO’s education monitoring data show that, among students reaching the end of primary, 58% achieve the minimum proficiency level in reading and 44% in mathematics. At the end of lower secondary, the corresponding shares are 64% in reading and 51% in mathematics. Yet once completion is folded in and the full age cohort is considered, the effective shares are lower: 51% in reading and 39% in mathematics at the end of primary, and 50% in reading and 40% in mathematics at the end of lower secondary.[a]

Those numbers explain why numeracy sits at the center of many reform discussions. Reading is far from solved, but mathematics tends to be weaker at both the primary and lower secondary stages. The result is not just lower average performance. It is a narrower pipeline into science, technical study, and data-heavy forms of work later on. That is one reason benchmark tables now appear more often in ministry reports than simple pass rates alone.[a][i]

Early-Grade Patterns in Lower-Income Settings

Lower-income settings show the value of separate early-grade measurement. UNICEF’s 2025 Foundational Learning Action Tracker covers 124 low- and middle-income countries and states that about two thirds of children are estimated to be in learning poverty, unable to read and understand a simple text by age 10. At the same time, the policy side shows movement: 81% of countries report that learning outcomes or benchmarks for foundational literacy and numeracy are clearly defined in the early-grade curriculum nationwide, while 57% report nationwide integration of social-emotional learning in the curriculum.[k]

That split is important. Countries are more likely to set benchmarks than to fully align teaching, support, and system use around them. A benchmark written into the curriculum is one step. Regular assessment, teacher support, use of results in planning, and catch-up instruction are separate steps. When people ask why benchmark policies do not always lift learning quickly, this is often the missing answer.[k][o]

Adult Skills Patterns

UNESCO’s 2026 data refresh reports that global literacy stands at 93% for youth and 88% for adults. That is progress in access and basic literacy acquisition. Yet adult skill surveys tell a tougher story about the use of literacy and numeracy at higher functional levels. The two views belong together. A country may post a strong literacy rate and still face weak adult numeracy in data interpretation, proportional reasoning, or multistep tasks.[i][f][g]

Why Comparing Countries Is Harder Than It Looks

Same Label, Different Construct

“Literacy” is not one thing. A literacy rate may come from census or survey responses about being able to read and write a short, simple statement. Foundational reading may come from a household module that requires mastery of word recognition and comprehension tasks. PIRLS reading is a Grade 4 reading comprehension measure. PISA reading is an applied literacy measure for 15-year-olds. PIAAC literacy is a functional adult skill measure. Each is valid for its own purpose. None should be treated as a direct substitute for the others.[j][l][e][b][f]

Sampling, Language, and Test Mode

Mode and language also shape results. PIRLS 2021 mixed paper and digital administration across participating systems. LaNA itself notes that its paper-based texts differ from the digital texts used in PIRLS 2021, and it warns readers to take that into account when comparing percentages at the same benchmark labels. UNESCO’s learning work also notes the role of language and the technical demands of one-to-one early-grade assessments when countries try to report Grade 2/3 indicators.[e][p][n]

National Exams and International Comparability

National examinations often dominate public debate, but they are not always designed for international comparison. Some test the curriculum taught in a country’s own sequence. Others are selection tools for the next level of schooling. International assessments are usually sample-based and designed to describe system performance across a reference scale. Neither approach is better in every context. They answer different questions. The real policy challenge is to connect them so that local accountability and international comparability support each other rather than compete.[m][n][o]

Three Interpretation Rules That Prevent Bad Comparisons

  1. Separate the population from the skill. Grade 4 reading, age-15 reading, and adult literacy are different indicators.
  2. Separate the threshold from the average. A country can have a decent mean score and still leave too many learners below the minimum benchmark.
  3. Separate reporting labels from real equivalence. The same benchmark number on two systems does not erase differences in language, mode, and construct.

What Is Changing in 2025 and 2026

A Revised SDG 4.6 Reporting Structure

One of the most important recent changes is technical, but it matters for anyone who tracks adult literacy and numeracy. In April 2025, the UN Statistical Commission approved the adoption of youth/adult literacy rate as the replacement global indicator for SDG 4.6.1. The older indicator on the share of the population reaching a fixed level of proficiency in functional literacy and numeracy was renumbered as thematic indicator 4.6.2. This change makes the official reporting structure easier to sustain because literacy-rate data are available for far more countries, while functional skill measurement still has much thinner coverage.[j]

More Attention to System Capacity, Not Only Test Scores

The second shift is a move from one-off assessment events toward assessment systems. The World Bank’s ALMA initiative, launched in 2024, is built around expanding the availability and use of learning data, closing data gaps, and helping more countries strengthen assessment systems. The same page notes coordination with partners around a 2030 goal so that all countries generate quality learning measures in literacy and numeracy in two grades at regular intervals. That may sound administrative, but it changes what benchmarking can do. A benchmark is much more useful when it sits inside a routine measurement cycle rather than a one-time project.[o]

More Precision Below the Traditional Floor

The third shift is better reporting for learners below the older low benchmark. LaNA’s Basic Benchmark at 325 is a clear example. It gives ministries a way to describe emerging reading and mathematics skills rather than treating all below-floor learners as one undifferentiated group. For systems where many pupils still sit below 400, that is a much more useful signal for pacing, support, and goal-setting.[p]

Measurement Expanding Beyond Device Use

A final change sits at the edge of literacy and numeracy policy: what else schools now want to measure alongside them. A recent cross-country reform review notes that the AI-curriculum conversation is shifting toward what students can explain, evaluate, and create, not only what buttons they can click. That does not reduce the place of reading and mathematics. It raises the pressure on them, because students cannot critique outputs, judge evidence, or work with data without strong literacy and numeracy underneath.[q]

For anyone reading country scorecards, the cleanest interpretation is to keep four signals separate: literacy rate, foundational skill rate, share reaching the minimum benchmark, and share at advanced levels. Those four numbers tell different stories. Used together, they show not only where a system stands, but where its measurement is sharp, where it is thin, and where policy action should focus first.[i][k][h]

Reference Links

Leave a Reply

Your email address will not be published. Required fields are marked *