Automatic recovery and verification of subtitles for large collections of video clips

2016 Oct

White Paper WHP 323 Oct 2016

Automatic recovery and verification of subtitles for large collections of video clips

Abstract

This paper describes an experimental system that can create good quality subtitle files for video clips derived from broadcast content. The system is designed to run automatically without the need for human verification. The approach utilises existing metadata sources, an off-air broadcast archive and an archive of original subtitle files along with audio fingerprinting and speech-to-text technology to identify the source programme. It then locates the position of the video clip, verifies the match between the video clip and the subtitles and create a new subtitle file.

This paper also reports on the results of the work using a large corpus of over 7,000 video clips and further, smaller sets of clips from different television genres, and explores where improvements might be made. It also looks at the limitations of the current approach discussing alternative methods for providing subtitles for video clips.

This document was originally published at IBC 2016.

Topics

White Paper copyright

© ��tv. All rights reserved. Except as provided below, no part of a White Paper may be reproduced in any material form (including photocopying or storing it in any medium by electronic means) without the prior written permission of ��tv Research except in accordance with the provisions of the (UK) Copyright, Designs and Patents Act 1988.

The ��tv grants permission to individuals and organisations to make copies of any White Paper as a complete document (including the copyright notice) for their own internal use. No copies may be published, distributed or made available to third parties whether by paper, electronic or other means without the ��tv's prior written permission.

��tv

Accessibility links

Research & Development