±«Óătv

Editorial Algorithms

Analysing and curating web content

Published: 1 January 2014

This project looks at ways to automatically extract editorial metadata (such as tone, language and topics) about web content, making it easier to find the right content for the right audience.

Project from 2014 - 2017

What we've done

Given the abundance of content online – both published by the ±«Óătv and present on the world-wide web — how can we curate the best content for the right audience?

Editors at the ±«Óătv - whether they make radio programmes, TV shows or edit websites - make decisions on a daily basis as to the kinds of content they want to include. Those decisions take a lot of things into account: the audience first and foremost, the timeliness (why this story? why now?), the quality of the idea or content, and the ±«Óătv’s own guidelines.

We set out to discover if we could automate those editorial decisions, or at least make it easier for ±«Óătv editors to find ideas from the vast wealth of the web and the archives of ±«Óătv content.


Why it matters

Curation of content, the craft of selecting a small set from the overwhelming abundance of choice, is increasingly becoming a major mode of discovery, alongside searching and browsing. The ±«Óătv has a long history of curating content - DJs selecting the music to be played through a radio programme, for example, but we wanted to explore how to better curate online content.

Through the ±«Óătv home page team for instance, we are already looking at how we can guide our audience across the hundreds of thousands of stories, guides, media and informational content published on our web platform. We in ±«Óătv R&D were especially interested in looking at how to help our online editors find and link to great content published by others, be it a local news story, a great post by a gifted blogger or a gem of a page on the web site of a major museum —something .

In order to answer this question, we have been looking at what would be needed to efficiently find and curate content to inform, educate and entertain. By prototyping and testing curation tools and methods to package and deliver curated sets of content, we were able to better understand how to help our researchers, editors and curators find the right content —whatever its origin— for the right audience.

We then developed an experimental system which allows us to index, analyse and query web content at scale. This is a similar mechanism to how search engines typically work, but whereas they typically provide results based on search terms, our tool set is able to interrogate content according to ±«Óătv editorial considerations such as its tone (whether it is serious or funny), sentiment, publisher, length and how easy it is to read.


Our goals

Some of the long-term goals of this project include:

  • Creating new tools and technology to help our editors, researchers find and curate the right web content for the right audience - whether it is a definitive selection on any given topic from the hundreds of thousands of web pages and articles published on the ±«Óătv web site through the years, or great relevant content from elsewhere on the web
  • Create novel technology, or adopt state-of-the art algorithms to create a unique layer of understanding about the wealth of textual content on the web - as a complement to the technology for audio/visual analysis developed in 

How it works

At the heart of the experimental system created through this project is a scalable pipeline for the ingestion and analysis of online textual content.

Once a source is added to the system (via a syndication feed such as RSS or Atom, or an API), our system then indexes the content from that feed according to various parameters including tone, sentiment, time to read, readability, timeliness etc., as well as knowledge about people, places and topics mentioned in the text.

Some of these editorial parameters are extracted using standard algorithms (such as the ), others use our in-house  technology, and others still are built on experimental machine learning algorithms.

This metadata can then be interrogated by queries such as “give me results about this subject that are long and difficult to read”, or “give me an article that is more light in tone than this one” or "give me all the recent content you can find about Australian politics".


Outcomes

The project officially ended at the end of March 2017. While there were many outcomes, including a number of talks, presentations, many prototypes, countless reports on user research - the main output of this project is an internal platform for the aggregation and analysis of web content, and a series of tools built around it.

We also published a series of blog posts about the genesis, development and outcomes of this project:

  1.  is a recollection of how our R&D team observed and learned from editorial expertise around the ±«Óătv and started wondering whether some of the knowledge needed for curation of content could be translated into automation and technology.
  2.  follows up with an exploration of the kind of information about web content which can be gleaned, either directly or indirectly. We ask the question: question: assuming we want to curate “the best of the internet” on a daily basis, how much of these features were already available for us to use in web metadata, and how much would we have to compute ourselves?
  3.  gets more technical and looks at our attempts at teaching algorithms (including Machine Learning algorithms) to understand not only what web content is about, but also more complex features such as tone, subjectivity and even humour.

Project Team

  • Olivier Thereaux

    Executive Producer
  • Katie Hudson

    Project Manager
  • Thomas Parisot

    Thomas Parisot

    Senior Web Engineer
  • David Man

    David Man

    Creative Director
  • Georgios Ampatzis

    Data Scientist
  • Manish Lad

    Senior Software Developer / DevOps
  • Tim Broom

    Senior UX Designer
  • Jiri Jerabek

    Interaction Designer
  • Henry Cooke

    Henry Cooke

    Senior Producer & Creative Technologist
  • Tim Cowlishaw

    Tim Cowlishaw

    Senior Software Engineer
  • Frankie Roberto

    Creative Technologist
  • Gareth Adams

    Gareth Adams

    Software Engineer
  • Richard England

    Creative Technologist
  • Fionntán O'Donnell

    Senior Software Developer / Data Hacker
  • Gareth Williams

    ±«Óătv Online Central Editorial
  • Ryan Norton

    ±«Óătv Digital Marketing & Audiences
  • Chris Newell

    Chris Newell

    Lead R&D Engineer
  • Kat Sommers

    Development Producer
  • Kate Towsey

    User Researcher
  • Edwina Pitman

    Editorlal Consultant
  • Nickie Latham

    Editorial Consultant
  • Matthew McGuire

    Software Developer
  • Internet Research and Future Services section

    The Internet Research and Future Services section is an interdisciplinary team of researchers, technologists, designers, and data scientists who carry out original research to solve problems for the ±«Óătv. Our work focuses on the intersection of audience needs and public service values, with digital media and machine learning. We develop research insights, prototypes and systems using experimental approaches and emerging technologies.

Rebuild Page

The page will automatically reload. You may need to reload again if the build takes longer than expected.

Useful links

Theme toggler

Select a theme and theme mode and click "Load theme" to load in your theme combination.

Theme:
Theme Mode: