Data analysis + automated news creation directly to audiences at lightning speed
Our robots analyse structured data – yours or ours – find connections and create quality content at high volume and speed. We then use our smart distribution platform to deliver the right content to the right audience at the right time.
Artificial Intelligence (AI) and Natural Language Generation (NLG)
Our robots use a combination of Artificial Intelligence (AI) and Natural Language Generation (NLG) to produce the texts from structured data sets. AI is used to choose the angle of the story, and an NLG application then writes the story – with possibly different angles depending on target audience. In addition, when a new language version is created we work with professional translators to get the idiomatic expressions correct for e g sports.
By Lars Widmark
23 Oct 2019
Why & how what we do is unique
At United Robots, we see ourselves as a service provider within news automation. We design, build and support the whole chain of actions required to get it up and running continuously within an organisation.
Setting up an automated news generation process consists of several steps like data integration, data analysis, textual design, template creation, content generation, content analysis, content integration with the client’s receiving CMS, continuous improvements to tools, support, monitoring of client’s receiving and sending systems and more.
There are some self-service tools on the market that let anyone automate content based on structured data. Writing templates that takes data from a spreadsheet file and transforms that into text snippets is relatively straight forward for a person with some basic technical know how. Building more advanced narratives will quickly become overwhelming and become more like writing programming code and less like writing text. You will be forced to create a lot of tools and system on top of such text generators to get the results you want.
United Robots’ solution includes a similar tool to these in terms of designing and managing text templates, but there are so many more aspects to building news automation solutions than the writer tool. It's only one link in the chain. We have summarised some of them below.
- Data analysis
- Text template design
- Media company features
- Scalable infrastructure
- We sell a service, not a tool
1. Data analysis.
United Robots are experts in analysing data, to produce newsworthy content. We work with multiple sources to build our insights. One example is when we produce real estate transaction articles. There we use the raw data from the real estate authority in Sweden, we then combine this data with two different geographical databases to be able to build insights like "Two weeks ago another property on the same street was sold for half the price" or "This is the top 10 most expensive properties sold this year on these three different beach strips". In addition to this, we also use Google street view combined with image analysis to automatically attach a relevant picture to each real estate article. If we only use the default image Google street view gives us for a certain address, we might get a picture with 90% trees and a snippet of a house in the picture. Instead we "move" back and forth along the property in Google street view and find a more clear shot of the actual property. Currently we are also working to consider the elevation of a property to get the best picture possible.
The data we analyze from the real estate entity, the geodata and the picture will then all be used as input for the text robot. This is often a more challenging task than formulating some templates in the "robot writer tool". The better insights you can build, the more newsworthy the articles you can generate.
2. Text template design.
We believe we have the most feature rich template design tool on the market. When writing a text we can for example consider whether something has already been mentioned in a previous text segment. An example is if we have written about John Doe having scored 10 goals in the game, we ensure that we don't repeat this in the next segments.
Another example is that when we write about a certain team's form: In our text corpus we have defined that we want a text segment that talks about home team A's form, and to keep writing interesting things not already mentioned until there is nothing left to mention. When we then write about away team B's form, we ensure that we don't mention things already written about in team A's form segment. For example, if team A have a 3 game win streak against team B, we should not write that team B have a 3 game long loss streak against team A.
The writer tool is not where we are trying to create an edge compared to for example Wordsmith. We currently do not market a fully DIY tool, although we could. We work closely with media houses to design and integrate the automated news into their content management systems and processes. When creating a new kind of text, like for example an export trade report, our engineers typically work with journalists from the media house to set everything up. The journalists create a couple of example texts that the engineers then build the data analysis and text templates to support. All text segments that have been designed in the writer tool by the engineers can be viewed and changed by the journalists. To set everything up from scratch can be really challenging and more suited for a software development kind of mind, but when it's up and running it's relatively straightforward for the journalists to make basic changes.
3. Media company features.
United Robots has extensive experience in delivering automated news to media companies. Besides analyzing data and designing robot text templates we have a set of features specifically targeted to media companies. For example when writing about sports, we produce one version from the home team's perspective, one from the away team’s perspective and one with a neutral perspective. Each newspaper configure which teams they focus on in each league they cover, and get the appropriate version sent to them. Another example is that for youth sports, we don’t use "hard" language as we would for professional sports. For example, if in a girls soccer game for 12-year olds one team beats the other by 30-0, we don’t write that team A completely crushed team B but rather something more generic like “a game was played and the result was 30-0 and they all had a good time”.
We are very flexible when integrating with the content management system of the receiver, to follow the tagging and categorisation system needed to put our texts in the right bucket. We also monitor the receiving system and can alert your technicians in case you seem to be having problems reading the content we send.
We can alert the news desk with the most interesting content from a statistical perspective in many different ways, for example over Slack, text message, email, web page. These alerts could be about e g that an all time high real estate sale was recorded, or that team A will win the league if they win tonight's game, or that person X scored 15 goals in a lower league game. We have many examples where journalists have written very highly read articles based on the information they got from the alerts on stories that otherwise would have been missed by the news desk.
We can produce for example Indesign export files with the markup of your choice to make the layout of the content for the print edition more efficient.
We can manage different publishing times for different kinds of content and also manage that the actual texts can be updated as new data becomes available. We will remember forever how each text was written, and as long as the new data does not change the main angle of the article previously generated, we will only add or change the new text into the current. For example you typically don't know the table standings until all games in a round has been played.
4. Scalable infrastructure.
The technical infrastructure is based on a serverless and completely cloud based architecture, allowing us unlimited parallelism and a very cost efficient through-put. A typical text takes around one second to generate, and we will be completely linear when scaling this. If we do 100 texts at the same time, they all will complete after that same second. Also since the actual text generation is so cost efficient, we can basically regenerate texts in real time. Our partners can regenerate their texts as often as they like. One of the key points is that it should be possible to generate texts that would be unique to each reader, in a cost effective way. To truly personalise your content, you need to automate the production. For this, it's important that the system can remain responsive in speed and scale in volume.
5. We sell a service, not a tool.
We provide a service where we take responsibility of working with our partners to continuously improve the automated news feeds that have been built. We analyse the generated texts in retrospect for correctness and variability as well as use feedback from our partners to prioritize which areas should be given focus in the next evolutionary step.