Humans get full marks, beating generative AI models’ gold-level score at top maths contest
Sign up now: Get ST's newsletters delivered to your inbox
The Google and OpenAI generative artificial intelligence models reached gold-level scores at the International Mathematical Olympiad for the first time.
PHOTO: REUTERS
Follow topic:
SYDNEY – Humans beat generative artificial intelligence (AI) models made by Google and OpenAI at a top international mathematics competition, despite the programmes reaching gold-level scores for the first time.
Neither model scored full marks – unlike five young people at the International Mathematical Olympiad (IMO), a prestigious annual competition where participants must be under 20 years old.
Google said on July 21 that an advanced version of its Gemini chatbot had solved five out of the six maths problems set at the IMO, held in Australia’s Queensland in July.
“We can confirm that Google DeepMind has reached the much-desired milestone, earning 35 out of a possible 42 points – a gold medal score,” the US tech giant cited IMO president Gregor Dolinar as saying.
“Their solutions were astonishing in many respects. IMO graders found them to be clear, precise and most of them easy to follow.”
Around 10 per cent of human contestants won gold-level medals, and five received perfect scores of 42 points.
US ChatGPT maker OpenAI said that its experimental reasoning model had scored a gold-level 35 points on the test.
The result “achieved a longstanding grand challenge in AI” at “the world’s most prestigious math competition”, OpenAI researcher Alexander Wei wrote on social media.
Google achieved a silver-medal score at the 2024 IMO in the British city of Bath, solving four of the six problems.
That took two to three days of computation – far longer than in 2025, when its Gemini model solved the problems within the 4½ hour time limit, it said.
The IMO said tech companies had “privately tested closed-source AI models on the 2025 problems”, the same ones faced by 641 competing students from 112 countries.
“It is very exciting to see progress in the mathematical capabilities of AI models,” said Professor Dolinar.
Contest organisers could not verify how much computing power had been used by the AI models or whether there had been human involvement, he cautioned. AFP

