Search engines and automated content

We do get questions from publishers who are concerned that the automated content we send them is at risk of being de-indexed of or having other actions taken against it by Google – because it's created automatically.

Updates:

October 2023 – US publisher designated News Topic Authority by Google thanks to automated real estate content
We learn from a US client that, as a result of publishing a high volume of real estate articles (automated texts about single sales as well as top lists), they have become designated a "news topic authority" by Google in the area of real estate information. In other words, their search ranking has improved thanks to robot written articles. And – they are getting more visibility and reach on their human written articles because of the automatic ones.

February 2023
Google publishes a new blog post on its Guidance around AI generated content. They now talk in more actively positive terms about it. "At Google, we've long believed in the power of AI to transform the ability to deliver helpful information."
(Published the same week they launched their chatbot Bard.)

November 2022
According to Google's Search Liaison Danny Sullivan, interviewed in Business Insider, the company's recent updates to the search algorithm are focused more on weeding out pages that are created specifically to game the algorithm rather than penalising pages that were created by AI tech. "Our systems focus on the usefulness of content, rather than how it's produced. This allows us to consistently provide high quality results, while reducing all forms of unhelpful content in Search, whether it's created by humans or through automated processes." He also said (on X / Twitter): We haven't said AI content is bad. We've said, pretty clearly, content written primarily for search engines rather than humans is the issue."

Our standpoint
It is reasonable to believe that it will (continue to) be the quality of any content, whether manually or automatically produced, that will determine its value in the eyes of the search engines. We believe the combination of trustworthy data and language created by humans, is a quality guarantor for United Robots’ automated texts in this context. The fact that the content is published by recognised local news sites demonstrates its clear intent to be useful to humans.

Below we explain how we've reached this conclusion.

Google’s standpoint
In their Guidelines, Google states: “Automatically generated content is content that's been generated programmatically. In cases where it's intended to manipulate search rankings and not help users, Google may take actions on such content.”

While it is not possible to know with certainty how Google defines, or to what extent the company is able to detect, automatically generated content, we are confident that the content United Robots produces lies outside the scope of Google’s statement.

Picking the statement apart, there are four distinct aspects to consider with regards to the automated content: Definition, Intent, Detection and Quality – see below.

The Helpful Content algorithm

Google also operates what it calls the The Helpful Content algorithm. The algorithm does NOT specifically target automated/AI content, but rather content in general that is of poor quality and does not inform users.

The purpose of the algorithm (and its autumn 2022 update), is to promote content that puts people's needs first, i e:

Created for a specific audience or reader segment.
Answers the specific questions of that audience or reader segment.
Using the required expertise, authoritativeness, and trustworthiness to be useful to searchers.
Meets the readers’ expectations.

On the other hand, the Helpful Content algorithm penalises "content that seems to have been primarily created for ranking well in search engines rather than to help or inform people." Such content may include auto-generated texts created by ChatGPT. United Robots does not use GPT or large language models in our current products.

Definition of "produced programmatically"

The topic of Google’s standpoint with regards to automated content is discussed in several recent SEO industry publication articles (e g this and this), which often refer to Google’s Search Advocate John Mueller. Both Mueller, and the articles in which he’s quoted, focus on automated content generated by probabilistic machine learning algorithms in language models such as GPT-2 or GPT-3 in this context.

This is not how United Robots’ tech works – we do not use machine learning models to generate text. Our texts are based on one or more sets of structured data, and are then generated using rules based AI, set up as a language tree where each branch is ruled by conditions.

All the language segments in our robots were originally written by a human. We do not even use automated translations – if a robot is being built in a new language, that work is done by human translators.
The texts are based on quality structured data from highly trustworthy providers; authorities, organisations and companies.

Intent – for readers, not ranking

In their Guidelines, Google provides a list of examples of instances where automated content is “ intended to manipulate search rankings and not help users.”

What United Robots does is generate automated content of high enough quality for journalistic outlets to actually pay for and publish it. The purpose of our automated content is to provide value to readers – that’s the fundamental business of our partner publishers, whether they focus on an advertising or reader revenue based business model.

Detection – can Google spot the robot?

As stated earlier, it’s not possible to know to what extent Google is able to detect whether a text was programmatically generated. Indeed, Search Advocate John Mueller, when asked, says he “cannot claim” they can.

But in order to gain some kind of proof of quality for our automated content, we ran some of our real estate texts (published by the Sacramento Bee / McClatchy) through the browser extension GPTrue or False, which displays the likelihood that selected portions of text were generated by GPT-2. In other words: the likelihood that they were generated through the type of programmatic method which Google keeps referring to when discussing this topic.

The result: “According to the detector, there is a 97.77% chance that the selected text [was] written by a human.”

(This was the result, despite the fact that each text carried the byline “Written by the SacBee Bot”)

Quality – it's what matters

It is reasonable to believe that it will (continue to) be the quality of any content, whether manually or programmatically produced, that will determine its value in the eyes of the search engines. We believe the combination of trustworthy data and language created by humans, is a quality guarantor for United Robots’ automated texts in this context.

Indeed, Google is itself actively supporting initiatives in the news industry, which combine data and text templates to create valuable information to consumers. The Google News Initiative has been involved in the funding of projects such as CrossTownLA and the UK Press Association’s RADAR.

Google and automated content

Why our content is safe

It's all about the quality

The Helpful Content algorithm

Definition of "produced programmatically"

Intent – for readers, not ranking

Detection – can Google spot the robot?

Quality – it's what matters

Address

Social

Curious? Get in touch!