This sprint in R&D's IRFS team we worked on analysing Casualty, sentiment analysis, the web and TV, immersive video and atomising stories.
Most popular 5-word phrases
648 going to be all right 623 do you want me to 514 what are you doing here 465 can i have a word 367 i'm going to have to
Most popular 4-word phrases
1872 what are you doing 1809 i don't want to 1298 do you want to 1218 can you hear me 1216 are you all right
They reminded us of and .
Update: Andrew has extrapolated these. Using a simple n-gram âletterâ model (i.e. just looking at the probability of 1 letter given n-letters before) he can generate random Casualty-ness, like this...
Happy birthday! Oh, Charlie!Who is it? It's Duffy!
CAN we win the Night Shift.âšLet's shock him.âšAs the most senior person on duty, Gareth Davies will be supervising all the doctors in the department? That depends on us. âšIs he in his office? I don't know, Kelly. I've given up asking. âš
Hi, I'm Helen. I want your complexion.âšIt's nothing wrong. I think I would have been going on? Put her over here, quick! Hurry up! ..What are we goin'? We'll see what they're doing him a favour. OK. Sats 98%, Pulse 100, BP 140/85, resps 25, sats 96, resps are 30. Pulse irregular. Like my barber.
Ad infinitum.
Analysing the web
This sprint the Discovery team has been looking at ways to analyse the sentiment of relatively long articles, as basic methods better suited to short sentences or social media posts tend to yield useless sentiment values. Our initial experiments with splitting articles into sentences and assessing the distribution of non-neutral sentences is proving promising, and we are reviewing similar approaches published in the past, such as (PDF). We also updated our seriousness analyser, sorted out some infrastructural stuff and sketched some new tools.
We have been working on ways to extract text from web articles using the DOM rendered by a headless browser (the open-source phantom.js) rather than HTML source, with some success - and the added benefit of the ability to generate screenshots at various resolutions. We still, however, face issues with questionable javascript-based redirects. Then Tim pointed us to a possible solution from .
"All curation grows until it requires search. All search grows until it requires curation." ()
Analysing media
Jana's speaker identification work has given some very good results, with significant improvements over the previous attempts. And Matt has been debugging an error we found in our Kaldi training on the new GPU machine. We've now managed to complete training of a new model with 3 times as much training data and it yields a measurable improvement in the system's performance.
Connecting TVs and radios
Chris has been documenting the MediaScape project as it wraps up - all the work done on device discovery, pairing, and authentication, as well as the overall architecture. And he joined a W3C Web and TV Interest Group conference call to kick off the âCloud Browserâ Task Force. "The Cloud Browser Task Force is a subset of the Web and TV Interest Group, whose goal is to discuss support for web browser technology within devices such as HDMI dongles and lightweight STBs (set-top boxes).". Libby is currently going around boring everyone she knows with set top box FACTS.
VR and 360 video
We've been improving the HTML5/VR music visualiser - working on procedural terrain generation, investigating new types of visualisation and adding a new scene for performance comparison. And Andrew helped facilitate some user testing in the North Lab with Middlesex University for a study that is looking at the experience of viewing 360 films on three different devices: laptop, phone & headset. And Zillah has been very busy running several VR pilots.
Atomising stories
Chrissy has been setting up things for the atomised news trial and working with Lei of ±«Óătv News Labs to investigate what data we can get out of ±«Óătv systems, while Lara has been tweaking the front-end of the prototype. Thomas joined the UX team and has been refreshing the design.
Andrew has been refining the design templates for our TV Story Explorer prototype and getting assets for other dramas. He has also been thinking about an -type service that could incorporate storylines and key moments. Alan also joined the team and is starting to think about how to parse scripts to extract story data.
Also
Tristan and Libby were presenting things at the ±«Óătv Data Day. The ±«Óătv College of Journalism .
We're to run our software engineering team (1 year contract, based in central London).
Links
Weâve been discussing a few nice local (to us) exhibitions about data and the interwebs, such as (with in particular), and .
Building an automated âsarcasm detectorâ remains one of our somewhat-jokey goals, and it looks as though
A for Python and R.
Two nice posts on and application design (via Ian Forrester)