@OCCOM BILL,
Bill,
You are thinking about it wrong. The anguage model isn't working on the level of vocabulary or idioms, it is trying to calculate the likelihood of sequences of words. My point above about how computer speech recognition is completely different from how you and I understand speech is very important. The computer is literally calculating statistics based on possible sequences of words and then choosing the most likely.
Quote:The vast majority of the legal writing we do here isn't really any different than any other kind of writing
When you understand that computer speech recognition is a process of finding the most likely sequence of words, then yes... legal writing is quite different then a journalists account of war, or a science text or this discussion on Able2Know.
The phrase "the defendant was remanded to the custody..." is much more likely in legal text then in a science text-- and with a language model, I can tell you precisely how likely this phrase is in either type of writing.
Humans have understanding, all computers have are statistics. Precise statistics are quite important for accuracy, and having a different language model for legal writing (or doctors notes or business plans) means that these statistics predicting how likely a sequence of words is are much more precise.
This is not a theoretical discussion, it has been backed up by tons of research. Companies spend millions of dollars on developing specific language models for different domains (which are then tweaked by speaker or organization). They do this because it has been shown to have a significant effect on accuracy.
There is a question about how much accuracy is worth to you. But I can say with near certainty that a language model specifically for legal writing will lead to a large improvement in accuracy--- not because of words or phrases, but because the statistical calculations behind the process of speech recognition will be more precise.