±«Óãtv

IRFS Weeknotes #284

Latest sprint notes from the IRFS team: Audio AR, Recommender Systems and Speech-to-Text

Published: 20 March 2019
  • Chris Newell

    Chris Newell

    Lead R&D Engineer

This sprint the Discovery Team has been revisiting their earlier work on Nearest Neighbour recommendations, the Data Team has been optimising speaker diarization and identification whilst Experiences have been expanding their work on Audio Augmented Reality

Several years ago the Discovery Team built a client-side recommender system called Sibyl (above) where the recommender engine is implemented in JavaScript and runs in the user's browser. The work was intended to explore how recommendations could be provided to anonymous users without any user history. The is still running but is frozen in time as it's original data sources became deprecated.

The engine used an approach called k-Nearest Neighbors (KNN) where the recommendations are based on a correlation matrix which describes the similarity between pairs of items. The similarity can be measured in terms of the item attributes (metadata-based filtering) or in terms of the users who have consumed them (collaborative filtering). The idea is that if you consume an item then the engine will recommend the programmes with the highest similarity, which are called the Nearest Neighbours. If you consume multiple items then the engine aggregates the scores of the Nearest Neighbours and returns the items with the highest scores. The advantage of the KNN algorithm over some other approaches is that the recommender model is transparent and easily understood - you can browse the Nearest Neighbour model and see the similarity values.

This sprint we have been exploring whether we can adapt the Sibyl recommender engine to run on a server, rather than a client, using which runs server-side JavaScript. The results have been impressive, with the aggregation and ranking tasks being surprisingly fast. As with many machine learning problems, the most difficult and time-consuming part is acquiring all the required item and user metadata.

Data Team

In the Data Team, Ben has been evaluating different Speaker Diarization methods (which partition an audio stream into segments according to the speaker) to see if he can improve our Speaker Identification and Speech-to-Text (STT) tools. He has looked at LIUM, and Kaldi X-Vectors.

Meanwhile, Matt and Misa have retrained the voice activity detection module for our STT and Speaker ID systems by adding examples from ±«Óãtv content. This made very little difference to the overall performance which was unexpected! They're now looking at the performance of the Voice Activity Detection on the STT and Speaker ID systems and trying to identify where improvements could be made.

Experiences Team

This week the Tellybox project received press coverage in Broadcast,  and the Star, following on from the earlier . Libby and Alicia were able to demonstrate Tellybox to the ±«Óãtv's Chief Technology and Product Officer, Matthew Postgate, when he visited our new building. Libby has also made a demonstration version of Tellybox that runs on a , to demonstrate how it might work as a 'Set Top Box'.

Nicky and Henry ran an excellent Audio Augmented Reality workshop with Sound Designers, Ben and Max Ringham where they kicked off their work on prototyping Audio AR experiences for Bose Frames. As we start to investigate Audio AR in depth, we published a series of blogs on the subject:

Out and About

Tristan gave a seminar on the at the and also talked to a visiting group of Norwegian journalists.

Nicky and George were on ±«Óãtv Radio 4’s Feedback programme, talking about voice assistants and the ±«Óãtv. 

Henry gave a talk at on the histories and myths of smart speakers.

 

Rebuild Page

The page will automatically reload. You may need to reload again if the build takes longer than expected.

Useful links

Theme toggler

Select a theme and theme mode and click "Load theme" to load in your theme combination.

Theme:
Theme Mode: