Butrflynet
 
  1  
Reply Tue 21 Sep, 2010 08:20 pm
@OCCOM BILL,
Quote:
Maybe Jespah will stumble upon the thread and enlighten me.


Earlier, I sent her a PM with your question and a link.
Butrflynet
 
  1  
Reply Tue 21 Sep, 2010 08:41 pm
@Butrflynet,
http://en.wikipedia.org/wiki/Dragon_NaturallySpeaking

This is a thorough review and description of version 8 professional.

http://reviews.cnet.com/voice-recognition/dragon-naturallyspeaking-8-professional/1707-3528_7-31228939.html

and version 10 professional. It also has the full specs for each:

http://reviews.cnet.com/voice-recognition/dragon-naturally-speaking-10/4505-3528_7-33227360.html?tag=mncol;lst

This is a comparison of features for the three versions of DNS 10 professional, preferred and standard:

http://reviews.cnet.com/4504-4_7-0.html?id=33227360&id=33227363&id=33227364&id=34007353&tag=compare

And here's their website where you can compare the features of all the various flavors:

http://www.nuance.com/for-individuals/by-product/dragon-for-pc/home-version/index.htm


Here are Amazon's detailed list of features for DNS version 9 Legal:

http://www.amazon.com/Nuance-Communications-A509A-X00-9-0-Dragon-NaturallySpeaking/dp/B000GZYK9G
maxdancona
 
  2  
Reply Tue 21 Sep, 2010 09:42 pm
@OCCOM BILL,
Quote:
I wonder how much different the Legal version could really be. Outside of a few Latin words and phrases here and there, it's just English written in a different style. For instance, I might go two pages starting every single sentence with the word that. Micrsoft Word doesn't like it much, but I don't know why the Dragon would care... unless it can't get its head around sentence fragments.


The language model is really important, and you would certainly want to use a different language model for legal documents then you would for other types of documents.

I think you are oversimplifying how this works. A language model is not a dictionary. It is much more then a phrase book. A language model incorporates a large amount of statistical data about how often words are used in which order. The goal is to predict how likely each possible phrase is.

I worked in the medical field. We didn't just have a separate language model for each doctor. We had specific models for each type of document that each doctor might write (i.e. one doctor might have 5 or 6 different models). A report on a surgical procedure used a completely different model then a report on an office procedure or patient correspondence.

Having separate language models for each type of document greatly improves the accuracy.

I am pretty sure that the legal version of Dragon has a different language model, and I am quite sure that it makes a big difference in the results.
JTT
 
  1  
Reply Tue 21 Sep, 2010 09:53 pm
Addressed to anyone with knowledge of these products.

If the reader had a fair knowledge of the context/ideas/jargon what would the level of difficulty be in comprehending a piece of text with a 15% error rate?
OCCOM BILL
 
  1  
Reply Tue 21 Sep, 2010 09:55 pm
@Butrflynet,
Thank you. One of your links provided:
Quote:
Improved - Higher Accuracy
Although all editions of Dragon NaturallySpeaking use the same core engine, the ability to create multiple custom vocabularies in Professional, Medical and Legal can increase accuracy by as much as 5%.
I can't imagine why I'd need "multiple custom vocabularies", so I'm not seeing the point. For $500; I think I'll teach it the extra words myself.

The "preferred" seems to add more commands you can use your voice for, like surfing the web. I doubt that would increase my supporting staff's efficiency. I think I'll try the Standard and see if I feel like we're missing out on something. I'd rather pay $80 for a program that may or may not prove useful than $600.

Thanks for pushing me into looking into it and providing the headstart! I'll report back my findings... and I'm looking forward to hearing Soz's review of the portable stuff, if she decides to try it out.
0 Replies
 
JTT
 
  1  
Reply Tue 21 Sep, 2010 09:58 pm
@sozobe,
Quote:
Consistency is the big thing. If I figure out their speech patterns I can kind of lock that in. (Lipreading: not at all dissimilar from voice recognition software.


My guess is that a one to one conversation is much easier than a round table discussion, Soz, even if there's a clear line of sight to each person's lips.

In a one on one, what do you figure your word recognition is? Do you actively, umm, drop, mentally, the filler words, such as articles, personal pronouns, ums and ahs, etc ie. do you focus on major conversational context words?
maxdancona
 
  1  
Reply Tue 21 Sep, 2010 10:07 pm
@JTT,
Let's see.

Pick a topic you know about, I will find a wikipedia text, where I will randomly change 15% of the words to things that kind of sound the same.
maxdancona
 
  1  
Reply Tue 21 Sep, 2010 10:10 pm
@JTT,
A key point. The way computers "recognize" speech is completely different from the way that humans understand speech.

Humans attach a meaning to each word. Our brains set up a context that we understand, we anticipate what points might be coming. Computers use pure statistical tricks based on what words are likely to be strung together. Computers don't even try to figure out the meaning.

Comparing Sozobe to a computer is an invalid way to think about what is going on.
OCCOM BILL
 
  1  
Reply Tue 21 Sep, 2010 10:18 pm
@maxdancona,
I'm not 100% sure I'm understanding you correctly insofar as different language models for different reports. I didn't see that as even an option in any of the versions. It appears that each liscense is only good for one person, however, so each attorney will be getting his own dragon trained to his style regardless (providing the dragon works for the guinea pig.)

Apart from some archaic sentence beginnings like "Now comes", "That", "Whereas", and "Wherefore's, etc. and the ability to understand a keycite (which would need to be verified anyway, and can be cut/pasted at that time); the vast majority of the legal writing we do here isn't really any different than any other kind of writing. In many cases it's much simpler, in fact. For instance, if I don't want my questions ducked in a list of interrogatories; only one detail of anything will be asked per question... to the point that it seems tedious and almost redundant.

We don't write contracts designed to confuse the layman here; our work is predominantly intended to be informative and persuasive to opposing counsel, D.A.'s and Judges. Deviations from customary writing would only serve to lessen the impact... and believe you me; not everyone with a J.D. is any smarter for having achieved it. Wink
maxdancona
 
  1  
Reply Tue 21 Sep, 2010 10:45 pm
@OCCOM BILL,
Bill,

You are thinking about it wrong. The anguage model isn't working on the level of vocabulary or idioms, it is trying to calculate the likelihood of sequences of words. My point above about how computer speech recognition is completely different from how you and I understand speech is very important. The computer is literally calculating statistics based on possible sequences of words and then choosing the most likely.

Quote:
The vast majority of the legal writing we do here isn't really any different than any other kind of writing


When you understand that computer speech recognition is a process of finding the most likely sequence of words, then yes... legal writing is quite different then a journalists account of war, or a science text or this discussion on Able2Know.

The phrase "the defendant was remanded to the custody..." is much more likely in legal text then in a science text-- and with a language model, I can tell you precisely how likely this phrase is in either type of writing.

Humans have understanding, all computers have are statistics. Precise statistics are quite important for accuracy, and having a different language model for legal writing (or doctors notes or business plans) means that these statistics predicting how likely a sequence of words is are much more precise.

This is not a theoretical discussion, it has been backed up by tons of research. Companies spend millions of dollars on developing specific language models for different domains (which are then tweaked by speaker or organization). They do this because it has been shown to have a significant effect on accuracy.

There is a question about how much accuracy is worth to you. But I can say with near certainty that a language model specifically for legal writing will lead to a large improvement in accuracy--- not because of words or phrases, but because the statistical calculations behind the process of speech recognition will be more precise.
Butrflynet
 
  1  
Reply Tue 21 Sep, 2010 10:47 pm
@JTT,
For my work, it is an annoyance because the transcripts have to be near perfect. This means that my routine is to download the audio/video file, use an online conversion site to split the audio from the video and convert the audio to the file format needed for DNS. Then, I let DNS churn on the audio to text conversion overnight (I have it set to high quality at the sacrifice of speed for this so it is slower and needs all night to do the hour-long audio files).

The next day, I spend an additional two or three hours listening to the audio, reviewing and correcting the DNS text, researching spellings of unusual names, locations and words on the internet, and then applying all the transcription and captioning timing sequence codes required by the client. Without DNS doing the major portion of the work, it would take me that same 8 or 9 hours of typing, plus the additional 2 - 3 hours to review it again. On average, it takes about an hour and a half of typing and review for every 10 minutes of audio.

I usually accumulate converted DNS text files for a few days and then do all the review and editing on weekends so I can deliver all the work to the client on Mondays. DNS saves my hands and wrists (and my sore butt from sitting so long) from a tremendous amount of abuse due to all that prolonged typing.


Someone using DNS for dictation would also have an easier time of it because they'd insert punctuation in places where needed while dictating. That's the big void missing from DNS in my transcript work. There is no punctuation so DNS has to guess at what and where it should be inserted. The error ratio is high for that and I spend a lot of time correcting it.
OCCOM BILL
 
  1  
Reply Tue 21 Sep, 2010 11:06 pm
@maxdancona,
I'm not trying to argue with you, Max. Really I'm not. I just don't understand what you think it is learning differently in this case. For testing purposes; I read your proposed legal line to my phone and the result was:
Quote:
"The defendant was remanded to the custody of the sheriff" is much more likely in legal text that in a science text.
The only thing it messed up was that in place of than... and that seems like an error that should be corrected by every model, if any at all.

I'm sure you are knowledgeable and correct about how this stuff works... but I'm starting to think the guys at Dragon have made significant advancements in the technology that you may not be aware of . They're claiming up to 99% accurate, which I'd agree is probably a stretch, but it's seems to work pretty damn well. And I have to assume their state of the art desktop stuff works as well, if not better, than the free stuff I downloaded to my phone yesterday.
Butrflynet
 
  1  
Reply Tue 21 Sep, 2010 11:12 pm
@maxdancona,
Here's a paste from the DNS help file that addresses what Max is describing:

Quote:
A vocabulary in Dragon NaturallySpeaking is a body of information that includes a word list and a language model. The word list includes information about all the words that the program can recognize. The language model contains usage information about those words. Dragon NaturallySpeaking uses a vocabulary to recognize words correctly based not only on the sound of the words, but on their context. When you create a new user, you select the vocabulary on which to base the user.

Vocabulary Type
You can select from among the following vocabularies:

General (US English)
A large vocabulary providing excellent recognition accuracy for general, business, and professional dictation.
Teens (US English only)
A large vocabulary containing words selected for a student population and providing excellent recognition accuracy for higher pitched voices, for example ages 11 through 18.
If you have the Professional Medical edition, the following vocabularies are also available: Surgery, Radiology, Pediatrics, Pathology, Orthopedics, Oncology, Obstetrics/Gynecology, Neurology, Mental Health, Medical Dictation, General Practice, Emergency, Gastroenterology, and Cardiology

Vocabulary Size
When you create a set of user files, Dragon NaturallySpeaking recommends the vocabulary that best fits your computer's speed and memory. If you click the Advanced button, you can specify a different vocabulary size from among the following choices:

Large—designed for computers with at least 512 MB of RAM.
Empty Dictation—a vocabulary with a language model but no words. This vocabulary is designed for use by value-added resellers who want to create specialized vocabularies from scratch.
Advanced
On the Create User screen of the New User Wizard, you can click the Advanced button to display the Advanced dialog box. On this dialog box you can choose a different speech model and vocabulary size. Dragon NaturallySpeaking automatically determines the best speech model and vocabulary size for your computer when you create a user, so you do not generally need to change these options.

Note
Some Dragon NaturallySpeaking editions or add-on products may install additional vocabularies.



Quote:
When you create a set of user files, you select a "speech model" on which to base the new user. A speech model includes an acoustic model that you adapt for your particular voice.

When you create a set of user files, Dragon NaturallySpeaking recommends the adult speech model that best fits your computer's speed and memory. If you want to use the Teens model, you must choose US English as the language.

Notes
In Dragon NaturallySpeaking, you can choose between the following speech models:

BestMatch III, which has the greatest recognition accuracy for English speakers.
BestMatch Array, which provides the greatest recognition accuracy for dictation using an array microphone.
BlueTooth 8Khz, which has the greatest recognition accuracy for speakers using a BlueTooth device.


Quote:
An acoustic model is a statistical representation of the sound patterns that make up individual words.

Dragon NaturallySpeaking modifies the acoustic model to reflect your particular voice during initial product training (General Training), when you make corrections, when you train individual words, and when you do supplemental training.


And while I'm there, here's the help file on the vocabulary function:

Quote:
Use this dialog box to view and customize your active vocabulary. You can add words and phrases or view and train dictation commands, such as commands that "press" any sequence of keys on the keyboard or run complex scripts (a series of computer instructions).

The following describes the different dialog box items:

Written form
Specifies the word or words that Dragon NaturallySpeaking enters.

To find and select words in the Written form box, type the word letter by letter. As you enter letters, the list of words changes. You can also:

Scroll the list of words.
Click words to select them.
Use the Ctrl and Shift keys to select multiple words.
Type into the word list box to search for words.
Note: Words with a written form that begin with a number, such as "99th," appear in the list before words that begin with the letter "a." To see these words, you must scroll the list up.

Spoken form
Specifies the word or words that you say. If the written and spoken forms are the same, you can leave this box empty.

For initials spoken letter by letter without periods, such as JFK, no spoken form is needed. If the initials should have following periods, such as W. Bush, enter the spoken form without the period, for example "W Bush".
If a spoken form requires a single letter, enter the letter with a period, for example "A." or "B.". For repeated letters, such as AAA, enter the spoken form as "triple A."
For numbers, write out the numbers in the spoken form. For example, if the written form is A340, enter the spoken form as "A. three forty".
List of words
Lists all words currently in the active vocabulary.

Dictation commands, such as "backslash," are located at the top of the list.
Words stored only in the backup dictionary are not listed.
A red star next to a word indicates that it is a custom word that has been added to the vocabulary.
A blue star next to a word indicates that the properties of the word have been changed from the default properties.
A green star next to a word indicates that a word was moved from the backup dictionary to the active vocabulary due to a correction. For more information, see the "Automatically add words to the vocabulary" option on the Correction tab of the Options dialog box.
Display
Allows you to select the type of words you want to appear in the list. The choices are:

All words: displays all the words in the current vocabulary.
Custom words only: displays only words added to the vocabulary by the user.
Words with spoken forms only: displays only words with spoken forms that are different from their written forms.
Words with formatting properties only: displays only words with special formatting properties that influence how they appear in the transcription of dictated text.
Words containing spaces: displays only words containing spaces. For example, "anorexia nervosa."
Words containing digits: displays only words containing digits. For example, "B12."
Words containing punctuation: displays only words containing punctuation. For example, "X-ray."
Words containing capital letters: displays only words containing capital letters. For example, "Zyrtec."
Temporary words: When you reply to a message while dictating in Outlook and Outlook Express, NaturallySpeaking finds new words from the original email message and recognizes them. When you use these words in replying to an email, these words are added as temporary words. For example, if you reply to an email that contains the drug name "Zomaril" and you use "Zomaril" in your reply, "Zomaril" is added as a Temporary word. When you send an email using temporary words, those words are permanently added to the vocabulary.


Quote:
The Add Words From Documents wizard allows you to customize a vocabulary by adding new words and by optimizing the language model. The wizard determines which words in the documents you select are not in the current vocabulary and analyses the frequency and order in which words appear to better understand your writing style.

You open the Add Words from Documents wizard by clicking "Add words from your documents to the vocabulary" in the Accuracy Center.

0 Replies
 
JTT
 
  1  
Reply Tue 21 Sep, 2010 11:12 pm
@maxdancona,
Thanks, Max. I wasn't comparing Sozobe to a computer. I was just asking her a separate question about how she perceives speech via lip reading.

Quote:
Let's see.

Pick a topic you know about, I will find a wikipedia text, where I will randomly change 15% of the words to things that kind of sound the same.


Is that an example or an offer?


As I've never had a wiki article with 15% of words "mixed up", I really can't quite grasp just what the difficulty would be, no sorry, how difficult it would be.

Because I'm a human and I know the context, would these similar sounding words be, a) like reading an ESL/EFL's writing where strange direct translations produce odd word choices.

Put another way, if you were doing these postings with Dragon Dictate, with no editing, would everyone here would pretty much breeze thru it.
0 Replies
 
Butrflynet
 
  1  
Reply Tue 21 Sep, 2010 11:16 pm
@OCCOM BILL,
Basically, in the Legal version, a lot of that analytical and statistical work has already been done for the jargon of the legal profession so the accuracy meter has a large head start on anyone trying to duplicate it with the Standard version.

I'm also not sure if there is a size limit on the various vocabulary and statistical modules in the Standard version that is not in the Professional (Legal) version. Haven't looked into that.
JTT
  Selected Answer
 
  2  
Reply Tue 21 Sep, 2010 11:19 pm
@Butrflynet,
Thanks to you too, BFN.

I guess even VRT [voice recognition technology] has to watch out for the grammar marm.

What I'm driving at is, could interoffice memos be cranked out with no editing and comprehension would be pretty much unaffected? Could my 90 year old mother whose arthritic hands make a lot of typing errors send out DD emails that everyone would get?

Could rough drafts of a paper be submitted strictly on a comprehension level or would a more advanced discussion be compromised?
maxdancona
 
  1  
Reply Tue 21 Sep, 2010 11:21 pm
@OCCOM BILL,
Bill,

The "up to 99%" accuracy is almost certainly for a model trained for a specific speaker (probably speaking in one specific domain). The more training you do on the model specifically for you, the more accurate it will be. The more similar the documents you use to train and dictate are to each other, the more accurate the model will be.

Ironically you are using Dragon as an example to argue against my point that having a domain specific language model (i.e. a version just for legal text) won't help accuracy. Yet, the reason we are having this discussion is because Dragon offers a version just for legal text. It sounds like the Dragon guys are doing exactly what I am suggesting they are doing.

I think the problem is that you don't understand the process of turning sound into text. It is not nearly as straightforward as you suggest. The big problems isn't the difference between "than" and "that". You haven't understood that that process is a statistical one, and the statistics make all the difference.

Here's an interesting experiment-- read the following into your speech recognizer.

"The deafened ant was remanded into the custody of the sherriff".

I suspect if you put an unnatural exaggerated pause in the correct place, you might get the desired sentence. The reason is that in your language model, the phrase "deafened ant" is not very statistically likely, whereas the word defendant is rather likely to be near the word remanded.






JTT
 
  1  
Reply Tue 21 Sep, 2010 11:22 pm
@Butrflynet,
Plus Bill, 600 or 1000 bucks doesn't seem like a ton of money for a law office. Law offices make their money by cranking out paper.
Butrflynet
 
  1  
Reply Tue 21 Sep, 2010 11:26 pm
@JTT,
With a lot of investment of time and effort in the training of the DNS software so it easily recognizes her voice, style of speaking and enunciation patterns, it could be fairly comprehended by most people.

However, just out of the box with just the initial 10 minute training session to get her started, no. There would be a lot of correction needed and it would be difficult to understand context. Many words sound similar acoustically to the untrained software, but have very different meaning and context.
Butrflynet
 
  1  
Reply Tue 21 Sep, 2010 11:40 pm
@Butrflynet,
Here's what the initial voice training of the Standard 9.0 version covers:

It tests the volume of your microphone by having you read this out loud:

Quote:
In this step the computer listens to the sound of your voice and adjusts the volume setting of your microphone. When the computer has finished adjusting the volume, it beeps to signal that the process is complete. If you reach the end of this text but you have not heard a beep, start reading the text again from the beginning. You should only have to read for about ten to fifteen seconds.


It then tests the quality of the input from the microphone by having you read this out loud:

Quote:
In this step the computer checks the audio input from your sound system. Having high-quality audio input is very important for good speech recognition. Poor audio input will make it difficult or impossible for the program to recognize your speech accurately. When the computer has finished checking the audio quality, it beeps to signal that the test is complete. If you reach the end of this text but you have not heard a beep, start reading the text again from the beginning. You should only have to read for about fifteen seconds.



Then you have to choose between a selection of text passages to read out loud ranging from easy, general reading, technical, historical, or humorous. You then read aloud for about 15 minutes while it adjusts the acoustic files for your voice's idiosyncrasies. The more passages you read out loud to it, the better it gets and recognizing your speech.

As you dictate to it, it continues to learn and periodically you run the vocabulary optimization utility so it can apply the statistical analysis to it.
0 Replies
 
 

Related Topics

 
  1. Forums
  2. » Dragon Speak
  3. » Page 3
Copyright © 2024 MadLab, LLC :: Terms of Service :: Privacy Policy :: Page generated in 0.03 seconds on 04/28/2024 at 01:32:04