Monday, August 30, 2010

YouTube Machine Transcription or 'What was that?!'

Colnect has recently published new videos on YouTube. These are simple videos explaining about our website for collectors (stamps, coins, banknotes and more) from all around the world. As Colnect is available in 50 languages, it's important to upload subtitles for the different languages. Although we already created the English subtitles, curiosity made us look at what YouTube's automated "Machine Transcription" have produced for our videos. The "Machine Transcription" tries to do an old, yet daunting task of speech-to-text.

How surprising and hilarious were the results revealed. Many words allegedly transcribed had little phonetic resemblance to the audio. Obviously, the "Machine Transcription" is sometimes much worse that actually having no subtitles at all. Here follows an example.

First the video:



And now a screen shot of some of the funniest so-called "Machine Transcription":


Enjoy highlights such as "Number of Clinton's each" which seems to refer to "See the number of items each (collector has for every category they are active in)". Can you guess the original for "what the offer free trade" used to be? This one is easier, it was "What they offer for trade". The last example would be "it's what the up for and what they want". We're still unsure but it might relate to "(View a member's profile with a click) and see what they offer and what they want". Quite similar, isn't it? :)

So, to conclude, be extremely wary when using the machine transcription for anything other than some good laughs. Oh, you're welcomed to joke about the accent in the video ;)

No comments:

Post a Comment

We welcome comments to our blog post but MANUALLY verify each comment. Spam comments will be reported. When asking for an answer on anything Colnect related, please use Colnect's forums. Thanks and happy Colnecting :)

Link and Search

Did you like reading it? Stay in the loop via RSS. Thanks :)