Humans outperform AI in Alibaba maths competition
Sign up now: Get ST's newsletters delivered to your inbox
More than 500 teams used artificial intelligence during the preliminary round of the 2024 Alibaba Global Mathematics Competition. None of them advanced to the finals.
PHOTO: REUTERS
Follow topic:
BEIJING - Although artificial intelligence has demonstrated capabilities surpassing that of humans in many fields, it still faces significant limitations in the realm of mathematics.
During the preliminary round of the 2024 Alibaba Global Mathematics Competition, 563 teams used AI to answer questions. Much to the surprise of AI advocates, none of the teams performed well enough to advance to the finals.
During the 48-hour preliminary round, AI and human participants were given the same exam questions, including multiple-choice, problem-solving and proof questions. AI teams were asked to submit their models in advance to prevent cheating.
According to the competition’s organising committee, the average score of the participating AI teams was 18, which was on a par with the average level of the human competitors. However, the highest score achieved by AI was only 34, which was far behind the highest human score of 113.
Chen Tianchu, who researches large models at the Computer Architecture Laboratory of Zhejiang University, said that the current working method of LLMs (large language models) is still to predict the next word at a fixed rate based on context and produce the results all at once.
For tasks that require repeated, multiple trials and rigorous thinking – like maths competitions – LLMs still have limitations, Chinese media outlet The Economic Observer reported.
About half of the AI team members were born after 2000 and represented institutions such as Peking University, Tsinghua University, the University of Oxford, Amazon Web Services and ByteDance.
Some of them adjusted open-source large models, enabling AI to advance from elementary mathematics to advanced mathematics; some built AI agents, combining prompt engineering to access closed-source models like GPT-4, upgrading GPT-4’s mathematical problem-solving abilities.
Tu Jinhao from Jianping High School in Shanghai achieved the highest score among those using AI. Drawing inspiration from the concept of self-debate, Tu applied multiple large models to several rounds of “self-questioning, self-answering, self-verification” to seek the optimal solutions to problems.
The top three AI teams earned prizes of US$10,000 (S$13,507), US$5,000 and US$2,000.
According to the organising committee, the annual event will continue to allow the use of AI to drive research and innovation in its application in mathematics.
Yin Wotao, a member of the committee, said in an interview with Shanghai Securities News that it is a positive attempt to break through the limits of AI capabilities and bring about more possibilities. CHINA DAILY/ASIA NEWS NETWORK

