Yoti February 5, 2026

Yoti facial age estimation – newest model evaluation by NIST

Erlend Davidson 7 min read

An image of a person holding their phone up and performing a facial age estimation.

We are delighted to share our latest evaluation by the National Institute of Standards and Technology (NIST) for our newest facial age estimation model. We have seen notable performance improvements across a number of metrics.

NIST’s evaluation is extremely thorough – they have over 20 million images for evaluation – and NIST develops their testing methodology over time. This helps to highlight models that are robust across multiple datasets and scenarios.

Our strategy for developing our model is not “data dependent”. Machine learning models can benefit greatly from quantity of data at the initial stage, but once reaching maturity, there is a significant diminishing return to adding more and more data. We do not use data from our checks, for example our facial age estimation images, nor scrape the web for images.

We do continue to add consented training data, but that is not the primary source of improvements we see over time. Instead, we have built our model to learn various characteristics of an image based on specific outcomes. This means we can scale our model more effectively, react to moving threat vectors, improve efficacy and work to reduce bias.

For example, this means we can train our model to avoid incorrect markers:

A wrinkled forehead could be a frown or due to the ageing process.
Take less notice of glasses as they may be more prevalent in older age groups, but are not indicative of age.
A beard would generally indicate someone is over about 18 years of age for a male, but could also be an easy presentation attack.

This is a challenge to balance out differing signals. Our strategy from the start has been to build a model that performs best to meet better real-world requirements for businesses and regulators. This latest NIST evaluation neatly demonstrates this.

Key takeaways from our latest NIST evaluation

We have seen a number of improvements from our previously submitted model – the key takeaways are:

Mean Absolute Error (MAE) has improved from 3.102 to 2.615 for the NIST mugshot data set

NIST uses multiple datasets, each of which have different characteristics. Principally, they also vary in terms of image quality:

NIST image sizes used in the evaluation of Yoti's Facial Age Estimation

This has a statistically significant effect on model performance. Even though the mugshot data set is only tagged by year of age (not, month, or even day, of birth).

We consider this the closest dataset to our real world use case due to the higher image quality. Mobile images we typically use are even higher quality at 720 x 800 pixels.
This takes us from 12th to 3rd best company for MAE.
The gap to the first place is now just 0.2 years MAE – statistically very low – this equates to just over two months.

Yoti facial age estimation improvement over time

Yoti has submitted four models for NIST evaluation. In the table below you can see how our overall performance for MAE has improved over time across all datasets, but for the mugshot dataset specifically from 3.78 to 2.615. This table shows MAE improvement for the 18-30 age group, stratified across age and gender.

A table showing the mean absolute errors of Yoti's four facial age estimation models submitted to NIST.

How did we do this?

Not the easy way – older age groups aren’t critical for accuracy with respect to current and impending legislation. The easy way to reduce MAE over an entire dataset would be to reduce the average error, which is much higher, for older demographics.

Our approach to improving our overall MAE was to reduce the error rate for the 18-30 age group, without materially sacrificing accuracy on older and younger persons.

The 18-30 age group covers many new and existing legislative developments, and is what our clients require to meet this legislation. As you can see below, our 39+ evaluation is very close between the two models, but with significant improvement in younger ages.

An image showing the mean absolute error of Yoti's previous 003 model compared to its current 004 model.

Bias across gender and skintone

NIST uses geographical regions as a proxy for skintone, whilst acknowledging this is not a perfect solution.

Below you can see Yoti’s performance across gender and region, where the MAE have the highest and lowest error rate. Closer to zero (the smaller the bar) the better, across the chart, is desirable.

A chart showing how Yoti's current model performs, by year of age, according to demographic group.

As you can see – we underperform across 14-16 year old females. This is why independent testing is critical – it shows us where we need to work on our model for specific demographics.

Here, as an example, is a model from an alternative vendor who is one of the other top 5 models:

A chart showing how another anonymous vendor's model performs, by year of age, according to demographic group.

Minimising bias is not just one of our principles, but is something regulators and businesses demand. We have always publicly released our own accuracy testing across age, gender and skin tone. Our goal is to minimise bias for everyone, and we recognise demographics where we need to improve, and focus resources into resolving those issues.

Most robust model

Our models are relatively robust (invariant) to facial expressions. This is an early experimental test with NIST, but an important and indicative one. This involves adding and removing glasses and changing facial expression – smiling, frowning, talking and just being your neutral self. This test measures the difference in age estimation across those scenarios.

The charts below show the difference in age estimation of a video of a person changing their facial expressions, then putting on glasses and repeating the same. The primary goal for a robust model would be to stay close to the blue line (the actual age of the individual). A secondary goal to have as little variance as possible across the test.

Here, Yoti comes top with the lowest noise. Yoti’s latest model (yoti-004) is bottom left.

A series of charts showing how each vendor's model varies when estimating the age of one person who is changing their expression.

Summary

Balancing model performance across all of these various metrics is a challenge, but important for online safety and for businesses to meet their obligations, as well as for regulators to feel satisfied that high assurance age checks are being performed well, and in a balanced way.

At Yoti, it is one of our principles that we build technology that works for all – we strive to build technology that performs fairly. That has led us to build a model from the start in a way that doesn’t just rely on increasing amounts of data to improve. All of this data is publicly available on the NIST website.

Combined with our world leading liveness and SICAP, businesses can have confidence that they are using a world class age check solution.

To learn more about facial age estimation, you can read our white paper or get in touch.

– Erlend, Head of Research & Development

Key takeaways from our latest NIST evaluation

Mean Absolute Error (MAE) has improved from 3.102 to 2.615 for the NIST mugshot data set

Yoti facial age estimation improvement over time

How did we do this?

Bias across gender and skintone

Most robust model

Summary

Keep reading

Thoughts from our CEO

Thoughts from our CEO: Yoti sees major improvements in liveness and age estimation models

What’s in store for 2026