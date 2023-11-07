NEW YORK – When the San Francisco start-up OpenAI unveiled its ChatGPT online chatbot late last year, millions were wowed by the humanlike way it answered questions, wrote poetry and discussed almost any topic. But most people were slow to realise that this new kind of chatbot often makes things up.

When Google introduced a similar chatbot several weeks later, it spewed nonsense about the James Webb telescope. The next day, Microsoft’s new Bing chatbot offered up all sorts of bogus information about the Gap, Mexican nightlife and the singer Billie Eilish. Then, in March, ChatGPT cited a half dozen fake court cases while writing a 10-page legal brief that a lawyer submitted to a federal judge in Manhattan.

Now a new start-up called Vectara, founded by former Google employees, is trying to figure out how often chatbots veer from the truth. The company’s research estimates that even in situations designed to prevent it from happening, chatbots invent information at least three percent of the time — and as high as 27 per cent.

Experts call this chatbot behavior “hallucination”. It may not be a problem for people tinkering with chatbots on their personal computers, but it is a serious issue for anyone using this technology with court documents, medical information or sensitive business data.

Because these chatbots can respond to almost any request in an unlimited number of ways, there is no way of definitively determining how often they hallucinate. “You would have to look at all of the world’s information,” said Dr Simon Hughes, the Vectara researcher who led the project.

Dr Hughes and his team asked these systems to perform a single, straightforward task that is readily verified: Summarise news articles. Even then, the chatbots persistently invented information.

“We gave the system 10 to 20 facts and asked for a summary of those facts,” said Vectara CEO Amr Awadallah, a former Google executive. “That the system can still introduce errors is a fundamental problem.”

The researchers argue that when these chatbots perform other tasks — beyond mere summarisation — hallucination rates may be higher.

Their research also showed that hallucination rates vary widely among the leading AI companies. OpenAI’s technologies had the lowest rate, around 3 per cent. Systems from Meta, which owns Facebook and Instagram, hovered around 5 per cent. The Claude 2 system offered by Anthropic, an OpenAI rival also based in San Francisco, topped 8 per cent. A Google system, Palm chat, had the highest rate at 27 per cent.

Anthropic spokeswoman Sally Aldous said: “Making our systems helpful, honest and harmless, which includes avoiding hallucinations, is one of our core goals as a company.”

Google declined to comment, and OpenAI and Meta did not immediately respond to requests for comment.

With this research, Dr Hughes and Mr Awadallah want to show people that they must be wary of information that comes from chatbots and even the service that Vectara sells to businesses. Many companies are now offering this kind of technology for business use.