ChatGPT a year on: time to talk about journalistic AI

Malmö, November 30, 2023 by Cecilia Campbell

On November 30, 2022, OpenAI let ChatGPT loose on the world. In the year that has passed since then, the news industry has been consumed by discussions around the implications and transformative power of generative AI. Twelve months and a CEO crisis at OpenAI later, and I believe it’s time to focus on the fact that not all AI is created equal. And that some, unlike generative AI, is actually built for journalism.

No doubt there will be dozens of columns written in the media industry on the occasion of the ChatGPT launch anniversary, so I will try not to make the most obvious points. I’ve got a reasonably unique point of view, in that I work for a company, United Robots, which had been producing and delivering automated AI generated content to newsrooms for years by the time ChatGPT came along – though not using generative AI. Instead we develop what media analyst Thomas Baekdal calls journalistic AI, the key feature of which is verifiable facts rather than elaborate text generation. More about this in a bit.

For United Robots, the year since ChatGPT burst onto the scene (and into newsrooms) has been a bit of a mixed bag.

Suddenly, everyone in the news industry was talking about AI. Having spent years trying to get our message through about the benefits of AI and automation, the topic now went from niche to mainstream virtually overnight. Case in point: When AP launched its Local News AI initiative in August 2021, they struggled to get newsrooms to even answer a basic survey. “Two years on, we heard back from at least six newsrooms, who demurred to take our first survey, ask how they can get involved with our work,” said AP AI Product Manager Ernest Kung at a project presentation recently.

Bemusement at the early frenzy. The discussion was pretty black and white early in the year: ”Will AI make or break the news industry? Is it saviour or enemy?” We had heard much of it before. In a column here in January, I felt obliged to point out that it’s neither, it’s a tool. And that publishers are in the driver’s seat.

Not all AI is created equal

For United Robots, the most challenging aspect of 2023 has been working out where we stand in this new world of ubiquitous generative AI. Our products are not built on generative AI, but on the more established rules-based AI (a k a expert systems) – or journalistic AI if you like. The latter has, relatively uncontroversially, been used, safely and reliably, for various newsroom tasks for years, including by early movers in automated content, such as AP. But today, while the two types differ in significant ways, there is generally no distinction made between the two when news publishers, or even industry experts, discuss AI. In fact, the discussion tends to now centre on generative AI per default, without this being specified.

United Robots build text robots using rules based AI, and we sell the automated content they produce. The raw material is structured, verified data – meaning only facts available in the data set end up in the text. The downside of building text robots, compared to using ChatGPT, is that it requires expertise and is a complex process involving programmers, writers (the robots don’t actually create the text segments, people do), data experts and linguists. The trade-off for an output of safe-to-use content is that it takes a lot of time and work. And of course the language is limited to what our journalists have written, stringent and with few embellishments. Its purpose is to communicate the facts from the data and the analysis done by the robot, nothing else.

In journalistic AI, hallucinations cannot happen because, as Thomas Baekdal puts it, the type of AI we build is journalistically limited by design. (More about our tech platform here.) In contrast, generative AI based on current Large Language Models like ChatGPT, simply looks for language patterns to create its texts, and is inherently unable to distinguish between fact and fiction.

I recently chatted to one of our earliest clients in Norway, who admitted that when ChatGPT came along, he’d worried that automated content as produced by United Robots would soon be “yesterday’s news” – but that a year on, he’s changed his view. “I’ve spent a year testing and trying to prompt my way to reliable texts in ChatGPT, but at this point it just can’t be done. Whereas I know we can trust texts produced with rules based AI, as we’ve done for several years already.”

Newsroom guidelines: generative AI not to be used for text generation. There are some common basic values created by any type of AI use in newsrooms, such as freeing up journalists’ time, creating efficiencies and producing more and better journalism.

However, when it comes to text generation, publishers need to tread carefully. Indeed, many of the sets of guidelines around the use of generative AI created by newsrooms in the past year, state specifically that text generation must not be done using generative AI.

So how is rules based AI different? Two main things:
Accuracy. As mentioned above, texts are created from sets of structured, verified data. The design of the tech means nothing outside the data set can end up as “fact” in the story.

Transparency. Because the texts are based on specific data sets, the sourcing is transparent. And consequently, if there’s a mistake or question, it’s possible to trace it back to the source. Large Language Models, on the other hand, use deep learning to create text based on existing text – in the case of GPT-3 on 175 billion parameters of human language drawn from the internet. It’s impossible to know what sources are used, and whether they are even real.

Thanks to the reliability and predictability of rules based AI, the newsrooms we work with generally publish the texts automatically – something that should never be done with texts produced by generative AI.

Transparency – the elephant in the room

For United Robots, transparency in our work has always been absolutely critical to us, our clients and their readers. Transparency in terms of the verified data source(s), as described above, and transparency around the publication – all our publisher partners clearly label the automated texts as AI generated.

Journalism is a human endeavour in which the only currency is trust, and by extension, transparency. In the context of journalism, I believe the most fundamental issue with generative AI, the way Large Language Models work and how they are governed, is the inherent lack of transparency throughout the process:

Invisible sources and unverifiable facts. Large Language Models are probabilistic and draw on all information accessible to them. This means they can include any information, more or less randomly, in a text. Words are just words, whether presented as “fact” or used for embellishment – the model is unable to work out the difference. LLMs have also been known to reference non-existent sources, and even articles that were never actually written, as documented by The Guardian earlier this year.
Inherent biases. At a media conference recently, one speaker encouraged journalists to not be overly concerned about using generative AI as “it’s only maths and labelling”. While I get what he was trying to say – ChatGPT is just a tool and can be used responsibly – labelling in itself is problematic. The datasets and classification of language and images on which AI models are built carries with them inherent biases, reproduced in their output. This is an issue that has its roots in the power structures of our society. It’s a complex and important topic and I highly recommend Kate Crawford’s book Atlas of AI for anyone who wants to thoroughly understand its implications.
Governance. Much has been written about the recent sacking and reinstatement of OpenAI CEO Sam Altman. Even so, it seems we may remain in the dark – at least for the time being – about what really went down and what lies ahead for the company and its tech. We can but hope that the new OpenAI board will include someone from the media industry, bringing copyright expertise to the table.

There is no question that generative AI and Large Language Models can do a lot of heavy lifting in the internal processes within newsrooms – creating summaries, headline and intro options, writing draft text from press releases and more. We should absolutely leverage AI to free up valuable reporter time. But as journalists trading in trust, we need to understand the greater implications of the tech. This year, the AI discussion in the news industry has had a predominantly inside-out perspective. My hope for 2024 is that we will focus more on the impact of AI in a context beyond the newsroom and how it works.

Share

Not all AI is created equal

Transparency – the elephant in the room