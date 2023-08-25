Several websites, including Amazon, the New York Times (NYT) and Shutterstock, have blocked OpenAI’s web crawler from gaining content which may enhance its artificial intelligence (AI) models.

OpenAI, which is the company behind ChatGPT, launched GPTBot earlier this month.

The three websites are part of the six “biggest websites” that have blocked GPTBot within the first two weeks following its launch, according to recent research from OriginalityAI, a company that checks for the presence of AI content.

The other websites include Quora, CNN and wikiHow, it revealed.

The NYT’s terms of service were recently updated to make the prohibition against “the scraping of our content for AI training and development… even more clear,” according to a spokesman quoted in a report by The Guardian on Friday.

The news outlet’s terms of service webpage, which was last updated on Aug 3, indicated that its content cannot be used for “the development of any software program, including, but not limited to, training a machine learning or AI system” without its consent.

GPTBot will comb through the internet and “may potentially be used” to improve future AI models, among other things, said OpenAI on its website.

“Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety,” it added.

The company also said that websites can choose to restrict GPTBot from accessing them by either partially or opting out entirely.

AI language models such as ChatGPT develop knowledge from vast amount of information gleaned through the Internet. The models then learn how to give correct outputs.

There have been concerns about how some web crawlers train AI models. For instance, some authors, including Stephen King’s pirated works, have been used to train AI tools, according to The Atlantic.