Language Technologies: The Future of Text Creation

GPT-3 and Data-to-Text: Are Language Technologies the Future of Text Creation?

Machine content creation, the Natural Language Generation (NLG), is considered to be one of the most future-oriented technologies. 

This is not only due to the ever-growing importance of online commerce and the associated volume of content that has to be generated, like product descriptions, for example. It is also a helpful tool for copywriters and content managers, i.e. anyone who is involved in writing any kind of content. 

In this context, two NLG technologies are very clearly distinguishable: GPT and Data-to-Text.

The Two Technologies: GPT and Data-to-Text

Generative Pre-trained Transformer (GPT) is a language production system that uses Deep Learning to create content.

Data-to-Text refers to machine automated production of natural language content based on data. 

But what exactly can these two technologies do? In what respect are they different from each other? Should copywriters and content managers be afraid that their jobs will be replaced by this type of artificial intelligence? 
This article presents both technologies, describes their functionality, and highlights their strengths and weaknesses.

GPT-3 and Data-to-Text: Capabilities and Differences

GPT: Definition & Background

GPT refers to a set of Large Language Models (LLM) that uses Deep Learning to process or generate natural language.

The company behind GPT and its creation in 2018 is OpenAI, a non-profit organization that released the first two versions for free and as open source. 

At first, the beta phase of the third generation was also available free of charge. However, when this ended, GPT-3 required a fee. The non-profit organization thus became a commercially operating company. Now, after a $1 billion investment in OpenAI, Microsoft owns exclusive licensing rights over GPT-3, which means that OpenAI continues to offer its public API and allows select users to send in and get output from content to GPT-3 or other models of OpenAI.

Access to GPT-3's underlying code will only be granted to Microsoft, allowing the company to embed, repurpose and modify the model as it sees fit.

What Can GPT-3 Do?

GPT-3 is a language technology that is trained with a huge volume of content from the internet and, among others, is able to:

  • Compose English content 
  • Conduct dialogs
  • Answer questions
  • Create programming codes
  • Design website templates
  • Fill in tables

How Does GPT Exactly Work?

GPT-3 is a speech prediction model. That means in other words that it has a machine learning model in the form of a neural network. This can transform an input into what is predicted to be the most useful result.

GPT-3 was trained with data from such sources as Wikipedia, forums, web content and book databases, all of which are thus the basis of the artificial intelligence behind the model. 

Its output is provided on the basis of the patterns that the model recognizes therefrom. When a user gives a text input, the system analyzes the speech and uses text prediction to generate the most likely output.
The content provided by GPT-3 is of high-quality and sometimes difficult to be distinguished from human-written content.

The Advantages of GPT-3: Fast & Inexpensive

The possibilities for using GPT-3 are numerous. The advantages of this particular technology are mainly in one area: the fast, inexpensive and automatic generation of content in large volumes

In other words, in cases where a large amount of content needs to be generated automatically on the basis of a small amount of content. Or, in situations where it is not efficient or practical to let the output text be created by a human. One example is a chatbot answering recurrent customer queries.

The Drawbacks of GPT-3: No Control & Misinformation

Despite the impressive language capabilities of this technology, GPT-3 has enormous drawbacks when it comes to generating content.

If one asks GPT, for example, to write an article about how absurd recycling is, it will do exactly that. In this case, GPT writes a contextually nonsensical text on this topic. This is because the anchoring to general knowledge or text-to-text solutions are completely absent and cannot be added. 

As the model is supplied with data from the Internet, it also takes over potentially racist or sexist remarks it may contain, allows bias or swear words to flow into the content, as well as generates false information.

Moreover, the same content may be repeated over and over again, especially in longer texts, rather than additional information being added. Similarly, representational bias can occur. That is because the websites used as a basis for the training only represent a part of the world. This leads to overrepresentation of some aspects and underrepresentation of others.

Ultimately, this means the generated content should not be posted without proofreading and intensive fact-checking

Apart from the Generative Pre-trained Transformer (GPT), other technologies for automated generation of content are available. Data-to-text technology is one of them.  

What can Data-to-Text exactly do? How is it different from the technology behind GPT?

Data-to-Text Technologies: Definition

Data-to-text technologies are a subfield in the artificial intelligence sector. These programs analyze structured data and generate ready-to-use natural language content from this data. 

There are multiple software solution providers whose software differs in sub-areas, as well as in price point.  One company that distinguishes itself through the enormous amount of supported languages for the automated content generation is AX Semantics from Stuttgart, Germany. 

What Can Data-to-Text Do?

Data-to-text software is used by all types and sizes of companies. 

Among others, these are banks as well as companies from the financial sector, the pharmaceutical sector, the media and publishing sector along with companies within the broad field of e-commerce. 

Data-to-text technologies are of tremendous help whenever large amounts of content are to be created based on structured data sets. This technology comes into play when similar content with variable details or based on data or on statistics are to be created. 

The following examples are worth mentioning:

  • Reporting in pharmaceuticals/health, finance and accounting field
  • Creation of landing pages following SEO criteria
  • Generation of product descriptions and category descriptions in e-commerce
  • Generation of personalized customer targeting
  • Reporting for sports or weather news
  • Stock market updates and election results
  • Offer descriptions in the tourism sector and property descriptions for the real estate industry

How Does Data-to-Text Work?

To ensure the functionality of data-to-text programs, structured data is required as a basis

For generating finished content out of this data, the user configures the rules and logics in the software. In this way, the relevant information is extracted from the data set and translated into natural language content. Variances may also be inserted. This leads to an unchanged sentence structure, but to small variables in the sentence being changed. This generates unique and recurring content. 

Data-to-text software products differ from GPT-3 by the fact that they are simply not trained with online content, but rather use data stored in CRM or PIM systems, for example, as well as in Excel, CSV, and JSON files. This data is directly imported into the software or transferred via an API. 

GPT-3, however, generates content entirely on its own, with no further intervention from a thinking human. The neural network alone generates an output from the given input. This can cause the already mentioned problems.
Regarding the variety of languages, Data-to-Text also has more options than GPT-3. Whereas GPT-3 is mainly trained in English, Data-to-Text providers support multiple languages. The AX Semantics software, in particular, supports more than 110 languages.

The Advantages of Data-to-Text: Speed & Controllability

Properly used, data-to-text can save an incredible amount of time and costs. Simultaneously, it can scale the content creation process.

Writing large amounts of content, e.g. thousands of product descriptions written for an online store, is almost impossible to produce by humans.  Certainly not if the content has to regularly be reviewed and kept up-to-date due to seasonal influences, for example. 

However, with Data-to-Text software, this is possible. As soon as the project is set up, all that is needed is an update of the data. Then, with just one click, already existing content can be immediately updated, or unique new content is created. As a result, copywriters and editors can spend more time on creative or conceptual work.
In contrast to GPT-3, the advantage of automated content generation using data-to-text is controllability. False statements made by the system or other unwanted statements can thus be ruled out. There are different possibilities available when using Data-to-Text technologies. With GPT-3, however, it is not possible to really influence what is delivered as an output.

The Drawbacks of Data-to-Text: Data Dependency & Time Consumption

Data-to-text based automated content creation also has its limits, though. Once the data quality is poor or the availability of high-quality data is not ensured, the output can be of low-quality.

The technology is based on structured data in machine-readable form. Therefore, storytelling, as well as writing blog posts or social media posts, are reserved for humans. After all, these cannot be meaningfully generated through data-to-text software. 

Thus, the generated content can only express the context of what appears in the data or what is derived from it.
Additionally, a certain amount of time is often invested in obtaining and cleaning the data that the AI needs as a basis. Together with the configuration of the software's rules and regulations, this can involve a greater or lesser amount of time and effort.

Conclusion

Both GPT-3 and data-to-text have their place as AI-powered content generation technologies. They both help in a very specific way and under different conditions in the creation of different types of content - e.g., the writing of whole flow content or the creation of product descriptions. 

One thing is clear: AI can in no way replace humans in their role as thinking beings - instead, it acts as a supporting measure and thus makes things easier for the user. With its help, copywriters, editors and content managers are eased of their workload and can concentrate on other tasks. 
Therefore, and because the demand for written content is steadily increasing, both language technologies will become more and more important in the future.

envelopephone-handsetmap-marker linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram