Automated Texts - A Breach of Google's Guidelines?
As the major search engine and the most significant traffic source, Google's guidelines are a relevant part of the daily discussion for marketers and SEOs. This past week, the statement of John Müller, Google's Executive, on "AI-generated content" and its effects on websites rekindled a debate about auto-generated texts and the issue of whether they are considered illegal by Google's web spam team.
The summary: the added value provided to the user determines whether Google penalizes websites with automatically generated content or not. The current wave of AI generation tools has no influence on this rule either. For this reason, from an SEO point of view, there is no issue with publishing content generated with AX Semantics (even in very large quantities) on websites, as long as it meets the known quality standards.
Nonetheless, we would like to look a little closer at the discussion and the reasons behind it.
The key question: has Google changed any of its policies regarding automated content?
There was some excitement as Google Executive John Müller was asked about Google's view on AI-generated content during a Hangout at the beginning of April. He stated: "We consider AI-generated content to be a breach of webmaster guidelines" - which was often understood as a general rejection of automated content. This is why we took another look at the Google Webmaster Guidelines. These guidelines clearly show: Google's attitude towards automated content remains unchanged.
No change in Google Webmaster Guidelines - punishment continues to apply to manipulative intent
The Webmaster Guidelines, referenced by Müller as crucial, still state that automated content is a problem if it has a manipulative intent and is not created to add value to users. "If search rankings are meant to be manipulated, and content is not meant to help users, Google can take actions regarding that content."
The actions meant here are manual and require a human review, not automated ranking penalties.
Poor content and spam are the problem, not automation itself
At this point, therefore, Google has no intention of using technical methods for recognizing automated content. Instead, the search engine uses the content and formal spam criteria for this type of content, as it does for handwritten texts.
It would be technically possible to identify some content generated by large language models, but not content generated by data-to-text solutions such as AX Semantics. Only the traditional spam indicators, such as poor linguistic quality or content-free speech junk, could be detected in this case. For content generated with an emphasis on user value, such as product descriptions or automated news and weather texts, Google does not penalize the content. Also, Google tolerates the simultaneous publication of high volumes of content, so long as it is not spam.
The reason for the current discussion: AI hype due to GPT-3 tools that do not have a rule-based approach
The current discussion is presumably due to the widespread use of new GPT3 tools. Once again, it is important to point out the difference between GPT3 and the data-to-text approach of AX Semantics.
|GPT and other large language models||Data-to-Text|
|GPT (Generative Pre-trained Transformers) rely on large language models trained with Deep Learning.||Data-to-text describes the automated creation of natural language content based on data.|
|Essentially, it can predict the next word and produce well-sounding, grammatically correct content.||Logic and triggers are used to derive statements from data and then generate content- and grammatically correct content.|
|The syntax of the content is fine which means the sentences are well-formed. But GPT does not produce meaningful text, so it can’t get the semantics right.|
The result can be texts that sound good but lack meaning and contain errors in content; many of these texts are simply nonsensical.
|Master syntax and semantics and produce both content and linguistically correct results. The meaning and the intention of the content are conveyed through the configuration, the factual correctness comes from the data.|
|Output texts are generated sequentially (one text at a time is produced) and must be selected, individually checked, and revised.||The rules are reviewed, and the output texts no longer need to be checked.|
SEO Expert Miranda Miller from Search Engine Journal, points out quite rightly that there are a lot of established AI content projects that have great content (and still rank well on Google with it): "The Associated Press started using AI to generate news in 2014. Using AI in content creation is nothing new, and the most important factor here is using it smartly."
|Which automated content is considered suspicious by Google?|
Source: Google Webmaster Guidelines Automated Texts
Pointless text, in which keywords are distributedAutomated translated text without validation or underlying set of rulesVery simple automated text based on Markov chains or using synonymization or concealment techniquesContent compiled from different web pages without sufficient added value
|Why data-to-text content continues to be more reliable to SEO than AI content: |
Content is based on data and data interpretations, rather than meaningless generalizations. The content meaning - and therefore the user value for Google - is determined by AX users (via logics, stories and triggers). So there are semantics in the texts, not just correct syntax.Users review every decision made by AI components in the system. Users are involved in all decisive phases of the generation system.Some software can determine whether a text is based on large language models, but there is no technical way to identify data-to-text content.