Google’s latest AI model has eyes, reads handwriting and does your maths homework

Google’s latest AI model, named Gemini, will be rolled out to users and enterprise customers gradually through 2024. PHOTO: AFP

SINGAPORE – Google’s upcoming artificial intelligence (AI) model for consumers can read handwriting and solve mathematical questions for students, potentially plugging a gap found in earlier AI chatbots.

Named Gemini, Google’s latest AI model, which can understand and generate images, audio and text, will be rolled out to users and enterprise customers gradually throughout 2024.

Google claims to have broken records set by other AI models, outperforming human experts on benchmarks for understanding languages, social sciences, mathematics and other subjects, its presenters said on Dec 6 in an online media conference on the latest AI tools.

Users will receive a first tranche of Gemini-enabled updates on Dec 6 on Google’s free-to-use AI chatbot, Bard, which can summarise information, generate code and reason with greater sophistication. The rest of Gemini’s headlining features will be gradually rolled out in 2024.

In a video shown to the media, the AI speaks to the user verbally and appears to understand anything it sees through a camera. It could understand mathematical workings scribbled on a page and suggest to the user how to solve the equation.

It created a game on command, giving clues to the user to point to a country on a world map based on hints, and even tracked the position of a ball shuffled under three identical cups shifting around.

The AI generated music based on images of musical instruments that were gradually introduced into the frame, and could identify an image of a crab based on a connect-the-dots puzzle without having to sketch it.

To achieve these feats, Gemini was trained using different types of media and components, allowing it to understand various modes of input, like text, audio and images, said Google.

This is unlike how AI models are conventionally built, by training them to learn different modes of communication in silos before “stitching” the components together, the tech giant said.

Gemini will be released in three versions, with varying levels of sophistication to fit the needs of mobile, general and enterprise users.

Mobile users will gradually receive updates with Gemini Nano, a mobile-friendly version that can run its AI features on the phone’s processing chip, even without an Internet connection.

Google Pixel 8 Pro phone users will be the first to use a slew of Gemini-powered AI features, including a summarisation tool on the Pixel’s Recorder app and help from AI to reply to text messages on WhatsApp. When asked why Gemini is not rolled out to the lower-priced Pixel 8, which runs on the same chip as the 8 Pro, Google said it will roll out the updates to other phones gradually to fit different requirements of each device.

The middle-tier Gemini Pro is the go-to version for most developers and business users who want to build AI apps using Gemini. It will also be the version of Gemini backing the updated Bard on Dec 6.

The most capable edition, the Gemini Ultra, is meant for select enterprise customers to execute highly complex tasks. Google said a souped-up version of its chatbot, Bard Advance, will also arrive in early 2024, backed with Gemini Ultra, but it did not state if it will be free to use.

Google said Gemini has been trained to weed out toxic prompts and responses, using 100,000 prompts pulled from the Web.

The tech giant’s latest features are set to heat up the generative AI market, trading blows with OpenAI’s ChatGPT, which was updated to understand and generate images and code, and speak with users verbally.

Join ST's Telegram channel and get the latest breaking news delivered to you.