Reply Mon 20 Sep, 2010 11:25 am
http://www.nuance.com/talk/
http://www.nuance.com/talk/images/top-banner.jpg

I can see the convenience of this product and the linguist in me is in love with it. The educator in me is skeptical. The father of an autistic boy who is being mainstreamed in me is in love with it. The traditionalist in me is skeptical. However if the future of writing is in speech recognition and not keyboarding this may be very cool. The sci-fi geek in me is sort of disspointed that it may eliminate the need to create cyborg hands with 27 attachable fingers to type faster.
 
View best answer, chosen by GoshisDead
Arjuna
 
  1  
Reply Mon 20 Sep, 2010 11:39 am
@GoshisDead,
I read a book about speech-recognition software about... um a looong time ago. I'm curious how they've overcome problems like the wide variation in vowel pronunciation in English. I'd assumed the technology would change the way we speak...

I actually spoke to a computer this morning.
sozobe
 
  2  
Reply Mon 20 Sep, 2010 11:44 am
@GoshisDead,
What I'm waiting for... and can't be far away... is the iPhone/ iPod app that allows you to hold the device towards a person and whatever they say instantly appears as text. I always have a moment after I've watched a captioned movie where I am very annoyed that the world isn't captioned. (I'm deaf, I read lips well but it's definitely an effort.)

The other side of that though is that I love how text-based the world has become, and worry about everything becoming voice-based instead. I've been worried for a while, and things have stayed text-based, so fingers crossed.
GoshisDead
 
  1  
Reply Mon 20 Sep, 2010 11:51 am
@sozobe,
Geesh never though of the deaf response to this type of software. That could get scary or very convenient depending on which way the trend goes.
0 Replies
 
Izzie
 
  1  
Reply Mon 20 Sep, 2010 11:54 am
@GoshisDead,
it's good software ... very useful for different reasons

http://able2know.org/topic/126720-1
0 Replies
 
OCCOM BILL
 
  1  
Reply Mon 20 Sep, 2010 04:00 pm
@sozobe,
sozobe wrote:

What I'm waiting for... and can't be far away... is the iPhone/ iPod app that allows you to hold the device towards a person and whatever they say instantly appears as text. I always have a moment after I've watched a captioned movie where I am very annoyed that the world isn't captioned. (I'm deaf, I read lips well but it's definitely an effort.)

The other side of that though is that I love how text-based the world has become, and worry about everything becoming voice-based instead. I've been worried for a while, and things have stayed text-based, so fingers crossed.
Your wait should be over shortly. Here is a version that works with the Blackberry for email (which may as well be a giant blackboard if you don't hit send.) These programs do like to fine-tune their recognition to a particular voice, however, so I wonder how accurate an "untrained" dragon would be. Couldn't know, but I'm guessing the dragon's "ears" would still be as accurate as reading lips. (I'm assuming you have to fill in blanks with probabilities when you're doing that?)

Anyway; anything thing Blackberry can do today, most every smartphone on the market should do tomorrow.

Cheers.
Izzie
 
  1  
Reply Mon 20 Sep, 2010 04:20 pm
@OCCOM BILL,
WOW - what an amazing breakthrough... thanks for the info ((Bill))


SOZ - do let us know how you get on if you're able to get your hands on one of these - surprised that this has not been thought of sooner - really really hope the technology can make this happen. x
OCCOM BILL
 
  1  
Reply Mon 20 Sep, 2010 04:32 pm
@Izzie,
Boy am I a dope. I didn't spot that the article says it's already released an iphone/ipad version. And since it's supposedly "free", that seems like a pretty fair deal. Wink
0 Replies
 
parados
 
  1  
Reply Mon 20 Sep, 2010 04:36 pm
@Arjuna,
Quote:
I'm curious how they've overcome problems like the wide variation in vowel pronunciation in English.

Dragon a few versions ago required you to spend some time reading text to the system to train it. The more time you were willing to spend training it would supposedly result in fewer errors.
0 Replies
 
OCCOM BILL
 
  1  
Reply Mon 20 Sep, 2010 04:45 pm
@Izzie,
And how did I miss the hug? Always good to see you Iz!
((((( Izzie )))))
0 Replies
 
OCCOM BILL
 
  1  
Reply Mon 20 Sep, 2010 05:33 pm
@sozobe,
I tested out the free Blackberry version, and this is the result:
Italicized is it's errors,( in parenthesis is my corrections)
Quote:
Okay so I downloaded the software and now I'm testing this stuff out I wonder how good it really works. It would appear I still have to provide punctuation if I interested, but I doubt that'll matter much to Sozobe (corrected in process). For the record it's a little slow but it does work. The way(Though) did not handle so so be (he, he) it doesn't really understand words only words that it already knows. I'm thinking it's probably not quite ready for Showtime for shows of you dude is a ( sozobe due to) speed issues but its recognition is remarkable.

Towards the end I speeded up to "naturally speaking”, and you can see it suffered.

My Blackberry Storm is getting old and is scheduled for my "new every two" in a few months and I do beat the hell out of it as phone, database (600+entries), MP3, Pandora, Calendar, Navigator, etc. A newer, or cleaner device will probably work MUCH better. I should also mention that I talk a little funny; I know this because I NEVER have to tell anyone who’s spoken to me a few times who I am on the phone.
Good luck!
sozobe
 
  1  
Reply Mon 20 Sep, 2010 06:14 pm
@OCCOM BILL,
Hey cool!

I have a Blackberry Curve on Sprint -- it says only Blackberry Curve on AT & T and T-Mobile networks.

I have an iPod touch, which they don't list but usually if it works on an iPhone it works on an iPod touch (but would need to buy a microphone attachment for it). (Which I could totally do.)

Yes, I imagine that lag time and recognition of widely varying voices will be the major snags (since I wouldn't be training it to understand my voice, but would want it to understand a near-infinite variety of voices).

While I definitely do a lot of guesswork with lipreading it's predictable, second-nature guesswork. If I used this I'd be looking at the screen rather than the person, not sure how that would work. (As in, I'd still be guessing, but in a different way and with different info.)

But if the technology is already this far along, I'd bet it gets the rest of the way there before too long.

Thanks for the info and product testing! Smile
OCCOM BILL
 
  1  
Reply Tue 21 Sep, 2010 09:27 am
@sozobe,
My pleasure. My curiosity was peaked too... and the price was right. Wink I’m picturing it being particularly useful for you if you run into a situation where you frequently have to talk to the same person, and said person is particularly tedious and/or hard to read. Cheers.

More info: I use Verizon and the download was very quick and smooth. During set up, it has you abandon one of your convenience buttons; and from that point forward anytime you're in an email program you can hold that button down to talk instead of type.... which means there's no effort to toggle back and forth. It definitely gets an A for ease of use.
0 Replies
 
maxdancona
 
  2  
Reply Tue 21 Sep, 2010 09:46 am
@OCCOM BILL,
Not so quick Bill, these speech recognition programs for voicemail to text are using tricks that you might not suspect (see #3 below).

I have worked as an engineer in a Speech Recognition team. Speech is still a hard problem. It is not very accurate in these situations, and we have a very difficult time dealing with background noise.

There are a couple of tricks we use to make the performance of Speech Recognition acceptable.

1. Limit the number of words that the speech recognition has to recognize. This works very well for banks etc, where it has to know the numbers and maybe "yes:, "no" and "operator", but not very much else. This is called a small vocabulary problem and for obvious reason is much easier to get right.

2. Get a specific speech model tuned to a specific person. Dragon on your desktop has you read a bunch of text that it knows, then it trains itself to your voice.

3. Use humans to help. Yes, the Nuance voicemail to text (and similar programs I suspect) have human operators in India that listen to voicemails and type what they hear.

The way it works is that the software does its best job. The good news is that the software can generate a "confidence" metric along with its transcription. It has a pretty good idea about whether it got it right or not. If it has a very high confidence (this not that big a percentage of voicemails) it will just send you this text. If not, it will send the audio recording, along with the results from the software to a human who can either push the "yep that's right" button or can make corrections. There is a percentage of voice mails that it doesn't even bother trying to make a translation. In this case the operator just types what he hears.

This works with voicemail because you can use a couple of hundred of operators and short messages working with voicemail that doesn't need instant translation (a few minutes of delay is fine).

I am very skeptical that the point iphone at person and get a text as they speak is going to be possible at any time in the near future.
GoshisDead
 
  1  
Reply Tue 21 Sep, 2010 09:57 am
@maxdancona,
That is full on interesting, thx for that
0 Replies
 
engineer
 
  2  
Reply Tue 21 Sep, 2010 10:01 am
More to Soz's fears, I think the next revolution will be skipping text completely. It's pretty easy to send voice mail instead of text mail. Put a mic on your desk, click a create mail button, record and send. Fast, direct, not bad on memory, not good for the deaf. It's just a small extension from Internet based voice mail.
maxdancona
 
  2  
Reply Tue 21 Sep, 2010 10:06 am
@engineer,
I don't think so engineer.

There are many ways that text is superior to audio. I can read faster then I can listen and the biggest advantage of reading is that I can easily skip over the boring parts. Voicemail to text services are a big hit for exactly this reason. Can you imagine wading through Able2Know if you had to listen to each post?

Combine this with the importance of text search, filtering and processing to today's technology, I think text will be the primary mode of communication at least until we get cerebral implant technology.
0 Replies
 
ehBeth
 
  2  
Reply Tue 21 Sep, 2010 10:14 am
@engineer,
engineer wrote:
It's pretty easy to send voice mail instead of text mail.


I've been getting quite a bit of voice mail as email lately. It's a bit of a pain in an office as it increases the noise clutter for everyone.
0 Replies
 
OCCOM BILL
 
  1  
Reply Tue 21 Sep, 2010 12:37 pm
@maxdancona,
maxdancona wrote:
I am very skeptical that the point iphone at person and get a text as they speak is going to be possible at any time in the near future.
You are simply wrong. I downloaded it and demonstrated it above. It's not perfect, but it works already.

Aside: I send/receive a LOT of electronic communication throughout the day and I'm already defaulting to pushing the dragon button and talking to populate the "To:" field. In my business, most every communication gets carboned to others, and saying names is a lot faster than typing or scrolling. (Push the button, say a name. Push the button, say another.) The jury's still out; but I'll be surprised if I don't start speaking my messages rather than typing, because it is extremely accurate if I consciously go to the trouble of annunciating each word properly (which really isn’t much trouble, and I assume will become an effortless habit in short order when talking to the device.)

Perhaps my funny voice is particularly easy for the software to get? I am male, and from the Midwest (where our accent is mostly none) so maybe the software works better for me than average.
maxdancona
 
  2  
Reply Tue 21 Sep, 2010 01:31 pm
@OCCOM BILL,
Bill,

I have worked on Speech Recognition as an engineer. This is the reason for my healthy skepticism.

Getting the dragon button to fill in the To: field reliably is not that difficult. This is an example of the small vocabulary I mentioned above. There are a limited number of contacts to pick from-- it can safely assume that you are going to pick from a name in your contact list. Even if this is not the case, it can also get a list of expected names (since it knows it will be a name). It knows "Chris" is more likely then "kiss".

I believe that if you are in a quiet room with a typical mid-western accent you can get decent, but not excellent, results. Let me repeat there is a big difference between large vocabulary (the system needs to handle any word you might say) and small vocabulary (the system assumes what you say will be from a limited number of possible words/phrases). I believe that for the large vocabulary case, unless you are very lucky, it will take training to make the pain of fixing its mistakes really worth the time savings of talking over typing. There is a lot of research to back this up, and any commercially viable large vocabulary products, including Dragon, involves training the recognizer for a specific speaker.

But the biggest problem is background noise. I would be very skeptical that you could point the phone at someone else in anything but an extremely quiet room and get anything resembling respectable results.
 

Related Topics

 
  1. Forums
  2. » Dragon Speak
Copyright © 2024 MadLab, LLC :: Terms of Service :: Privacy Policy :: Page generated in 5.71 seconds on 12/26/2024 at 05:20:00