Scroll to top

How we built it: Automated content without the heavy lifting

During INMA's Product and Data for Media Summit I did a short presentation on how United Robots build automated content without any heavy lifting on the part of the publisher. I thought I'd take this opportunity to do a follow up around the process of getting automated content off the ground. "This is of course a brand new process for most of our clients, but it's what we do every day," says United Robots Head of Delivery Gunnar Södergren. "Our job is to make it as easy as possible."

At the recent NYC Medialab AI & Local News event, moderator Matt MacVey asked the panel what tech expertise a publisher needs in house to start using content automation. Cynthia DuBose, Managing Editor, Audience Engagement at McClatchy, who work with United Robots on automated real estate content, and with Lede AI on automated high school sports content, said that knowing where to find the right data is key to start building out the content you want, and emphasized that automation is absolutely doable even in a small newsroom. "Don't feel discouraged. Find the right vendor or partner and start experimenting."

While content automation is becoming increasingly commonplace in the news industry, almost everyone we work with is doing it for the first time. Taking the step of letting robots produce some of the content published by the newsroom is a big one for many – for editorial reasons, but also, significantly, out of a concern for what technical implications it might carry with it. The very first question we had from one of our US publisher partners was indeed; "Before we decide to start, we need to understand how much heavy lifting is involved".

GunnarSödergrenIt's the job of our Head of Delivery Gunnar Södergren, to take publishers through the process of getting the automated content live on sites and in apps. "Our language team and development team are involved as well, but I'm the go-to guy for the publisher." There are three processes that happen between sign and launch – sometimes in sequence, but more often with some overlap.

1. The robot is set up to the publisher's product and specifications, such as what zip codes of real estate data to include, what sports, which roads for traffic updates etc.
2. The language is iterated in a joint process between the newsroom and our language team. Dedicated journalists or editors in the newsroom is key for this to be an efficient process.
3. The content feed is integrated with the publisher's CMS.

Below we go through the key tech aspects, success factors and possible blockers, Gunnar has identified through his experience with client projects.

Know your CMS. "The CMS integration is the potentially most heavy lift the publisher has to do. It used to be the final step we did before launch, but we've come to realise it's better to get integration started early on. If a publisher is used to receiving external feeds into the CMS, it's very easy, but if this is a new process, it can take some time. From my point of view, it's key to have a contact person who knows the CMS," says Gunnar.

There are a number of ways the content can be delivered. By far the most common one is for the publisher to receive JSON files to a specific end point (URL) in the CMS which then triggers an automatic workflow to convert the file into an article and automatically publish it. Some publishers require other formats, e g XML. If a CMS doesn't support this type of receiving end point, files can be uploaded to an FTP, or sent via mail. Another option is the set-up of an RSS feed. "The goal from our point of view is that it should be possible to send the content straight to readers on sites and apps, though some publishers do the publish step manually."

Decide on schedule and distribution. To get the most out of the automated content, it's key that it reaches the right reader at the right time. In terms of timing for topics like sports and traffic the sooner the content is published, the better normally. But for things like real estate, United Robots tend to receive the data in batches once a week for example. In these instances, Gunnar's team can deploy a delivery schedule, so that instead of 100 texts once a week, the publisher receives 14 a day. The schedules can also specify hours of the day (given time zone), or can weight the number of texts to times/days when there's less other content. In terms of geography, content can include any metadata tags, which is how publishers manage where it ends up, whether on local subsites, or pushed to individual readers in specific locations or newsletters.

Give someone ownership. While content automation might not be considered core business for media companies, having a clear project or product owner generally means a publisher gets progress and value out of the effort in a much more efficient way. Current such product owners include Jan Stian Vold at Bergens Tidende in Norway and Ard Boer at NDC in the Netherlands.

Content automation does not have to involve a lot of heavy lifting for publishers. As Cynthia DuBose says, find the right vendor or partner and start experimenting.


A growing number of publishers automate real estate coverage – here’s why
Time to bust some popular myths about robot journalism