Google’s AI chatbot is trained by humans who say they are overworked, underpaid and frustrated
Sign up now: Get ST's newsletters delivered to your inbox
Google has positioned its AI products as public resources in health, education and everyday life.
PHOTO: REUTERS
Follow topic:
NEW YORK – Google’s Bard artificial intelligence chatbot
Ensuring that the response is well sourced and based on evidence, however, falls on thousands of outside contractors from companies including Appen and Accenture, who are underpaid and labour with minimal training under frenzied deadlines, say several contractors, who declined to be named for fear of losing their jobs.
The contractors are the invisible backend of the generative artificial intelligence (AI) boom that is hyped to change everything. Chatbots such as Bard use computer intelligence to respond almost instantly to a range of queries spanning all of human knowledge and creativity.
But to improve those responses so they can be reliably delivered again and again, tech companies rely on people who review the answers, provide feedback on mistakes and weed out any inklings of bias.
It is an increasingly thankless job. Six current Google contract workers said that as the company entered an AI arms race with rival OpenAI over the past year, the size of their workload and the complexity of their tasks have increased.
Without specific expertise, they were trusted to assess answers in subjects ranging from medication doses to state laws. Documents shared with Bloomberg show convoluted instructions that workers must apply to tasks, with deadlines for auditing answers that can be as short as three minutes.
“As it stands right now, people are scared, stressed, underpaid and don’t know what’s going on,” said one of the contractors. “And that culture of fear is not conducive to getting the quality and the teamwork that you want out of all of us.”
Google has positioned its AI products as public resources in health, education and everyday life. But privately and publicly, the contractors have raised concerns about their working conditions, which they say hurt the quality of what users see.
One Google contract staff who works for Appen said in a letter to Congress in May that the speed at which they are required to review content could lead to Bard becoming a faulty and dangerous product.
Google has made AI a major priority across the company, rushing to infuse the new technology into its flagship products after the launch of OpenAI’s ChatGPT in November.
In May, at the company’s annual I/O developers conference, Google opened up Bard to 180 countries and territories and unveiled experimental AI features in marquee products such as search, e-mail and Google Docs. Google positions itself as superior to the competition because of its access to “the breadth of the world’s knowledge”.
Google, owned by Alphabet, said in a statement that it is not relying on only the raters to improve the AI and that there are a number of other methods for improving its accuracy and quality.
Workers are also frequently asked to determine whether the AI model’s answers contain verifiable evidence. They are also asked to make sure the responses do not “contain harmful, offensive or overly sexual content”, and do not “contain inaccurate, deceptive or misleading information”.
Surveying the AI’s responses for misleading content should be “based on your current knowledge or a quick Web search”, say its guidelines. “You do not need to perform a rigorous fact check” when assessing the answers for helpfulness.
The example answer to “Who is Michael Jackson?” included an inaccuracy about the singer starring in the movie Moonwalker – which the AI said was released in 1983. The movie came out in 1988.
“While verifiably incorrect,” the guidelines state, “this fact is minor in the context of answering the question, ‘Who is Michael Jackson?’”
Even if the inaccuracy seems small, “it is still troubling that the chatbot is getting main facts wrong”, said Ms Alex Hanna, director of research at the Distributed AI Research Institute and a former Google AI ethicist.
“It seems like that’s a recipe to exacerbate the way these tools will look like they’re giving details that are correct, but are not,” she added.
Raters said they are assessing high-stakes topics for Google’s AI products. One of the examples in the instructions, for instance, talks about evidence that a rater could use to determine the right dosages for Lisinopril, a medication to treat high blood pressure.
Other technology companies training AI products also hire human contractors to improve them. In January, Time reported that labourers in Kenya, paid US$2 (S$2.65) an hour, had worked to make ChatGPT less toxic. Other tech giants, including Meta, Amazon and Apple, make use of subcontracted staff to moderate social network content and product reviews, and to provide technical support and customer service.
“If you want to ask, what is the secret sauce of Bard and ChatGPT? It’s all of the Internet. And it’s all of this labelled data that these labellers create,” said Ms Laura Edelson, a computer scientist at New York University. “It’s worth remembering that these systems are not the work of magicians – they are the work of thousands of people and their low-paid labour.”
Ms Emily Bender, a professor of computational linguistics at the University of Washington, said the work of these contract staff at Google and other technology platforms is “a labour exploitation story”, pointing to their precarious job security and how some of these kinds of workers are paid well below a living wage.
“Playing with one of these systems, and saying you’re doing it just for fun – maybe it feels less fun, if you think about what it’s taken to create and the human impact of that,” said Ms Bender.
Some of the answers these raters encounter can be bizarre. In response to the prompt, “Suggest the best words I can make with the letters: k, e, g, a, o, g and w”, one answer generated by the AI listed 43 possible words, starting with suggestion No. 1: “wagon”. Suggestions 2 through 43, meanwhile, repeated the word “woke” over and over.
Ms Bender said it makes little sense for large tech corporations to be encouraging people to ask an AI chatbot questions on such a broad range of topics, and to be presenting them as “everything machines”.
“Why should the same machine that is able to give you the weather forecast in Florida also be able to give you advice about medication doses?” she asked. “The people behind the machine who are tasked with making it be somewhat less terrible in some of those circumstances have an impossible job.” BLOOMBERG

