Below are two screencaptures of (allegedly) autogenerated closed captioning for a US/Department of Defense short documentary on the effects of a nuclear blast on aircraft that fly near a nuclear explosion.
For some reason, Italian was the only default language for autogenerating the close captions for the video.
The following poorly autotranslation of the above autogenerated Italian captions into English:
Clearly, LOL wasn't an actual term or acronym when this film was shot.
Clearly, the narrator is speaking clear enough and grammatically correct (for the most part) English. There is some audio distortion and artifacts likely from the film to video conversion but not enough to make the audio unintelligible.
Is this failure, on the part of a lazy or incompetent or a trolling translating software programmer or something else?
I hold out hope that it's a stop along the way and that speech recognition software will get there. But currently automatic captioning sucks, and the fact that it's just OK enough to not always be obviously sucky also sucks because (hearing) people (who don't actually rely upon it) think it's adequate and it's not BECAUSE IT SUCKS.
I've seen other autogenerated captioned videos and they're much better then this (keeping the bar too low). Three out of five (optimistic) were somewhat correct. At least the gist was somewhat eligible. The failure noted above bordered on word salad/schizophrenic nonsense. For people who depend on it? I am truly sorry this deservice is so horrendously bad. We need more people volunteering their time to caption videos like these rather then let autogenerating software do such a poor job.
Why I get so het up about it is that I run into it all the time professionally (a college student is told that all the videos for a given class are captioned, that turns out to be automatic captions, and they're essentially useless) and probably 90% of the videos I run across in everyday life (shared on Facebook, etc.) that are captioned at all are automatic captions, and again they're essentially useless.
A big issue is that they tie in just closely enough that someone listening to it while looking at the captions may think they're really quite good. But if the captions are your only source of info, the lack of punctuation and crucial errors too often renders the whole thing incomprehensible.
Mon 6 Jun, 2016 01:32 pm
And bless the people who do that. Vlogbothers and Tyler Oakley are two examples of YouTubers who have an excellent volunteer crew. Timeliness is sometimes an issue but their actual captioning is lovely.
Automated captioning is a close cousin of what I work on in my job (I work in speech recognition). I will be the first to tell you that autogenerated captioning sucks.
This is actually a very difficult technical problem. To get reasonably accurate speech recognition we make assumptions about the speaker, the type of topic being discussed (the vocabulary used in a voicemail is different than in a doctor's note) and the background noise.
Videos make it very difficult to make assumptions about any of those. The speaker, the background noise, and even the topic change quite often in these documentaries.
Mon 6 Jun, 2016 04:36 pm
If you mean like Captel, that's a little different. There is an actual human doing real-time captioning, not software. I haven't used them and they aren't real popular but I think they work fairly well.
Max, yep. People are so conditioned to think "technology will fix that," and in my job we are frequently fielding questions from people who think that speech recognition will solve all communication issues when dealing with deaf or hard of hearing people. Dragon Naturally Speaking and such can get really quite good when trained to a specific person's voice, in a quiet environment, when the person is dictating rather than speaking normally. But as you say that is SO much different from most conversational (or video, or classroom) situations.