Dragon Speak

@maxdancona,

Dragon wrote:

I'm not questioning your resume. I'm conveying my actual results.
Sent from my Verizon Wireless BlackBerry

@OCCOM BILL,

Impressive Bill, and I believe it under the best of conditions. I have done the same.

I assume you are in a quiet place speaking clearly with your mouth very close to the phone microphone. I also assume that the thing messes up about 15-20% of the words that you need to correct somehow. (I also assume that they aren't zipping your dictations to a human being for fixing before they send it back to you, and it is also possible that your provider is making a model specifically for you based on the corrections you make.)

A 15-20% error rate (i.e. getting 85% of the words correct) is cool. This is the error rate on the best of the best systems under ideal conditions for speaker-independent models (i.e. those that aren't trained for a specific speaker). However, it is not good enough to really make you more productive in most commercial settings.

Take the phone to a busy place and hold it toward a person who is speaking to you. Tell me what results you get then.

1 Reply

@maxdancona,

maxdancona wrote:

Impressive Bill, and I believe it under the best of conditions. I have done the same.

I assume you are in a quiet place speaking clearly with your mouth very close to the phone microphone. I also assume that the thing messes up about 15-20% of the words that you need to correct somehow. (I also assume that they aren't zipping your dictations to a human being for fixing before they send it back to you, and it is also possible that your provider is making a model specifically for you based on the corrections you make.)

A 15-20% error rate (i.e. getting 85% of the words correct) is cool. However, it is not good enough to really make you more productive in most commercial settings.

Take the phone to a busy place and hold it toward a person who is speaking to you. Tell me what results you get then.

Dude, read up. I downloaded the software to the device just yesterday. Zero training. The words appear seconds after I say them (it's not connected to anyone to correct them) and you can see the accuracy for yourself. I am in my office, which is relatively quiet, but I do have music playing softly in the background. I'm pretty sure the device has good noise cancelation technology, because I am frequently asked if I'm at the office when driving down the road with the top down.

Proper annunciation and a measured delivery is all that's necessary to make it function remarkably well... and as I noted above; I would expect a newer "cleaner" device to be able to process the words even more efficiently. There seems to be some built in programming that sacrifices quality for quantity if you speak to quickly, but these are parameters I can train myself to stay within (and practically already have.)

The limitation seems to be more about processing power than software capabilities. I don't know if it's learning as it goes automatically, or if my measured pace and proper annunciation is 100% responsible for the improved performance; but I've used it relatively few times and with it's improved functionality it is quickly becoming a valued tool already. The minor errors it does make are no worse than my fat fingers on the touch screen. You are correct that in the limited setting of 600+ contacts; it is dead on (haven't seen it error yet.) For this reason alone, it has permanently replaced the "flashlight/video" feature that previously occupied that convenience button. (Will look for a device with more convenience buttons at replacement time.)
Cheers

1 Reply

@maxdancona,

As a heavy user of the PC version of Dragon Naturally Speaking (DNS) for my transcription work, I'd have to agree with Max. DNS doesn't do well with background noise such as the audio from a television or radio in the same room. It also absolutely does not like any kind of music in the recording it is converting to text. It tries to convert all those sounds to text. For instance, if there is a lot of commotion in the room, it will translate that as a lot of da da da da das. You can raise or lower the quality to speed conversion ratio to match your needs. The higher the quality requirement, the slower the conversion time.

You can train it to multiple voices. I do this for my repetitive transcription clients. I set up a user definition for each of them and use that definition each time I have it convert the audio tapes to text for transcription. I then use it to go back and correct the major mistakes it makes so it is constantly learning and improving for each of those various user voices. They each have completely different sounding voices and accents so it requires a unique user set up for each of them. The only thing that throws it off is all the extra musical interludes people add to video and audio tapes these days. I still have to manually transcribe anything that isn't just strictly voice audio.

1 Reply

@Butrflynet,

By the way, you can also expand and improve on the DNS vocabulary by having it scan through all your documents in your My Documents directory. I also use this feature as added tutoring for my various DNS user definitions. Periodically, I have it scan the text files of transcriptions it has previously done for that user so it learns the speech pattern and technical terms that are frequently used by that person.

2 Replies

@Butrflynet,

Butrflynet wrote:

By the way, you can also expand and improve on the DNS vocabulary by having it scan through all your documents in your My Documents directory. I also use this feature as added tutoring for my various DNS user definitions. Periodically, I have it scan the text files of transcriptions it has previously done for that user so it learns the speech pattern and technical terms that are frequently used by that person.

That sounds like a hell of feature. Which version are you using? I looked into the Legal version, but $799 seems pretty steep if a cheaper version can learn the legal vocabulary simply by scanning my abundant supply of precise finished documents.

1 Reply

@OCCOM BILL,

Two things.

First, I am almost certain that your phone is not doing the speech recognition. Speech recognition for mobile devices is done "in the cloud". This means that the phone sends your voice over the "internet" to a server farm that sends back the text. This means that processing power is irrelevant. Your phone isn't doing the work. It is just sending your voice to some server somewhere else to waiting eagerly for work to do.

Second, are we arguing without disagreeing? I said I would find an 85% accuracy rate believable. You are saying the accuracy is plenty good enough for you. These are not contradictory statements. You can measure the accuracy rate you are getting (100 - #errors/#words * 100).

1 Reply

@OCCOM BILL,

This is the one I bought in 2006 (and still use) for under $100. They're up to version 11 now.

You could probably improve the quality with your cell phone if you are able to use a headset/microphone with it. This is what I use with my PC version except mine has a USB plug.

1 Reply

@Butrflynet,

The DNS version 11 is only $80.

http://www.amazon.com/Dragon-NaturallySpeaking-Home-Version-11/dp/B003VNCRNQ/ref=dp_ob_title_sw

0 Replies

@maxdancona,

maxdancona wrote:

Two things.

First, I am almost certain that your phone is not doing the speech recognition. Speech recognition for mobile devices is done "in the cloud". This means that the phone sends your voice over the "internet" to a server farm that sends back the text. This means that processing power is irrelevant. Your phone isn't doing the work. It is just sending your voice to some server somewhere else to waiting eagerly for work to do.

I'll be damned. I just turned off the radio to verify this, and that is correct. Neat.

maxdancona wrote:

Second, are we arguing without disagreeing? I said I would find an 85% accuracy rate believable. You are saying the accuracy is plenty good enough for you. These are not contradictory statements. You can measure the accuracy rate you are getting (#errors/#words * 100).

Sort of… You seemed to be hypothesizing that my results would be worse than they actually had already been demonstrated to be in practice. I mean, you flat out doubted it could do what it had already done. I couldn’t know what standard results are, as you seem to, but I’m pretty confident in what I see with my own eyes.

1 Reply

@Butrflynet,

Butrflynet wrote:

This is the one I bought in 2006 (and still use) for under $100. They're up to version 11 now.

You could probably improve the quality with your cell phone if you are able to use a headset/microphone with it. This is what I use with my PC version except mine has a USB plug.

I see the Standard 11 is for sale at only $79. I wonder if Max could weigh in on the expected differences between 9, 11, and the specialty Legal version? (We generate 40 hours of Legal Dictation a week now, and will be doubling staff in November when we move to a larger office.)

(I’ve always been a big fan of Plantronics headsets… they’re worth every extra penny)

0 Replies

@sozobe,

sozobe wrote:

What I'm waiting for... and can't be far away... is the iPhone/ iPod app that allows you to hold the device towards a person and whatever they say instantly appears as text. I always have a moment after I've watched a captioned movie where I am very annoyed that the world isn't captioned. (I'm deaf, I read lips well but it's definitely an effort.)

The other side of that though is that I love how text-based the world has become, and worry about everything becoming voice-based instead. I've been worried for a while, and things have stayed text-based, so fingers crossed.

Now, that prompts a kind of irrelevant question; can you also read the lips of people who mumble? I'm really interested.

2 Replies

@OCCOM BILL,

I think I have been consistent in my skepticism level. Getting a 15% error rate in good conditions for a speaker-independent model is completely believable. Whether this is good enough is somewhat subjective and certainly depends on the use of the recognition. We did a lot of research about what error rate we needed to get below to actually improve productivity over straight typing for our application.

My skeptic flag was originally raised over the idea that this technology could be useful for a deaf person understanding another person talking in a normal environment. Unfortunately, given the multiple challenges in this problem (background noise, large vocabulary, unknown speaker) I don't think this type of application will be realistic for a very long time.

1 Reply

@roger,

It's harder for sure. But possible.

Consistency is the big thing. If I figure out their speech patterns I can kind of lock that in. (Lipreading: not at all dissimilar from voice recognition software. Smile

)

Appreciating your commentary, Max. And I understand what you're saying.

85% accuracy would be quite useful though. As Bill indicates, I already do a lot of guesswork -- lipreading probably provides far less than 85% of the total data, I have to go with what I can see (maybe 20%?) and do a whole lot of filling in the blanks from context, patterns, etc.

In fact, a closer analogy I can think of is when live captioning on TV started, and it was awful. I had to do a lot of decoding then, got used to it.

Definitely tempted to get a microphone attachment and see what happens with my iPod...

1 Reply

@roger,

You'd have to hold it in front of their faces when they are driving..

Anyone follow the cartoonist DIFFEE? I think of myself as a deffee. Soz has been very clear on how the deaf community distinguishes the hard of hearing from the profoundly deaf, which I agree with, but some of us have lite versions of hearing issues. Such a dream device has appeal, if not any immediate likelihood of usefulness for the hearing impaired.

0 Replies

@maxdancona,

Max, do you know the Dragon product specifically enough to weigh in on this question? What differences would you expect between 9, 11, and the specialty Legal version? Also, would there be an accuracy difference between the home, premium and pro versions? (looks like fluffy extra stuff, that wouldn't really effect the reliability of the translation?)

2 Replies

Can it be used for someone with a bit of a speech impediment?

0 Replies

@OCCOM BILL,

No I don't know Dragon very well. I worked on a back-end product, where we maintained a server farm where all the speech recognition was done in some warehouse full of computers. This is very different then Dragon, where the work is all done on one local machine.

I played with Dragon, but I don't know very much about the specifics. That being said (here comes pure speculation), a lot of the intellectual property for speech recognition comes with speech model data. It is possible that the one version could use much better speech models (both amount of data, and types of data used) that would make the end result much more accurate. I know very little about the "bells and whistles" (i.e. how you would interact with the thing) other then a few hours of playing (which was awfully fun).

I am sure that the Legal version has a specialized language model. This means it comes with an idea of how likely a possible phrase is.

1 Reply

@OCCOM BILL,

Jespah probably could, she used to work for the company and also has the legal background.

0 Replies

@maxdancona,

Thank you for your candor.

I wonder how much different the Legal version could really be. Outside of a few Latin words and phrases here and there, it's just English written in a different style. For instance, I might go two pages starting every single sentence with the word that. Micrsoft Word doesn't like it much, but I don't know why the Dragon would care... unless it can't get its head around sentence fragments. Butrflynet says I can let it scan documents; so I could let it scan a veritable mountain of topic relevant documents, written by the very person it would be interpreting. I'm wondering if the Legal version isn't essentially the same thing; with a great deal of boilerplate docs already scanned? Custom Dragon-training could conceivably be even better.

Maybe Jespah will stumble upon the thread and enlighten me. Right now I’m thinking dictation would probably improve in both speed and quality if the employee was simply correcting errors as (s)he listened, rather than having to type it on the fly.

2 Replies

Related Topics

Quick Links

My Account

able2know