±«Óãtv

Research & Development

Posted by Chris Newell on , last updated

Citron is an experimental quote extraction and attribution system developed by the Natural Language Processing team at ±«Óãtv Research & Development. It was developed to explore the use of artificial intelligence in journalism and today, to encourage further work in this area.

can be used to extract quotes from text documents, attributing them to the appropriate speaker and resolving pronouns where necessary. It supports direct and indirect quotes (with and without quotation marks respectively) and mixed quotes (which have direct and indirect parts). It was developed to explore whether we could use artificial intelligence to extract structured data from unstructured text sources, such as archives and news feeds, to make the information more accessible to journalists. The goal was to allow journalists to easily find everything a person has said about a specific subject. We could use similar techniques in the future to extract other information buried in text, such as factual claims and statistics.

The work was encouraged by a and the developed by the School of Informatics at the University of Edinburgh. We extended their approach and combined it with a coreference resolver to determine the full names associated with pronouns when these were the source of a quote. Each element of the system was trained and optimised for best performance using the PARC dataset and data we created internally.

The implementation we’re releasing provides a command line application and a web server supporting a REST API. The server allows Citron to be deployed easily to provide a web-based service for extracting quotes. The screenshot below shows an example of the results from Citron.

Quotes extracted by Citron from a single ±«Óãtv news article

Quotes extracted by Citron from a single ±«Óãtv news article

We previously reported on in a at the at KDD 2018. Initial feedback from users had established that simply highlighting the quotes in a document was a useful and time-saving feature. However journalists also wanted to search more widely for quotes on specific topics. We therefore applied Citron to a large archive of ±«Óãtv News articles and constructed a searchable quote database. This allowed users to search for quotes by particular speakers and on specific topics as shown below.

Example search results from the quote database

Example search results from the quote database

This is just one potential application of and we would be very interested to hear from anyone who finds other applications or can contribute to the where you can find my contact details.

Topics