Google has unveiled a new feature called Google-Extended for website publishers
Web publishers can now decide if their site's data contributes to training Google's AI models
Google has unveiled a new feature called Google-Extended that enables website publishers to choose whether their data is used to train the company’s AI models. This development allows websites to remain searchable on Google while opting out of their data being utilized for AI model enhancements like the powerful Bard.
With the introduction of Google-Extended, publishers can still have their sites scraped and indexed by Google’s web crawlers, such as the Googlebot, while ensuring their data is not utilized for AI model training. This feature allows publishers to effectively control the access to the content on their websites and manage whether their sites contribute to the improvement of AI APIs like Bard and Vertex AI. Previously, Google confirmed that it trained its AI chatbot, Bard, using publicly available data scraped from the web.
To implement Google-Extended, web publishers can make use of the robots.txt file, which informs web crawlers about the accessibility of specific sites. Google emphasizes that as AI applications continue to expand, it will explore additional machine-readable options to provide more choice and control to web publishers, promising more updates in the future.
Several websites have already taken steps to block specific web crawlers used by entities like OpenAI to scrape data for AI training, including prominent platforms such as The New York Times, CNN, Reuters, and Medium. However, blocking Google entirely poses a challenge since websites still require indexing on the search engine. Consequently, some websites, like The New York Times, have opted to legally block Google by amending their terms of service to prohibit companies from using their content for AI training purposes.
Read more : Google Introduces Short Videos on Play Store to Enhance App Discovery
Google’s move to offer an opt-out feature aligns with publishers’ concerns regarding data usage. By providing publishers with the choice to exclude their data from AI training, Google aims to address these concerns and respect the rights of website owners, while continuing to advance AI applications.