VOICE TYPE RECOGNITION SOFTWARE RE-VISITED
My Thoughts on Speech Recognition:
I am excited to join in the "debate" about Speech Recognition, as I have been using same for some time now. In writing this article about Speech Recognition, I will be addressing certain questions that have been presented to the group.
First, let me "say" something about my use of definitions. I use the terms Voice Recognition and VTR (Voice Type Recognition) or Voice Recognition (VR) or Speech Recognition all as meaning the same thing. Dragon refers to Dragon Dictate, (Dragon Systems, Inc.) and Via Voice (VV) refers to any of the Via Voice programs by IBM. Typhoon refers to a modification that a company by the name of Typhoon has made to the IBM Via Voice program. I use many of these terms interchangeably.
Next, let me "say" something about the "horsepower" required, in my opinion, to use Speech Recognition programs successfully and perhaps, more importantly, "profitably". I tried Voice Recognition on several machines. It was not until I upgraded my NT Workstation to196 Meg of RAM, Pentium II 266 with MMX that I was able to get the results I wanted. This was about 18 months ago, when that was a big deal. Today, it is no great shakes on the clock speed.
I don't know what part of the equation NT Operating System plays a part in this success, but anyone who uses an NT O/S knows how steady it is, and how seldom it "crashes". Anything else for me just did not work in that the product was not dependable, i.e. it would be fine one day and horrible the next. I now get consistent results. Not perfect, but consistently good.
I also tried at least 4 different headsets. I have settled on a Parrot VXI (a headset with noise canceling ability) with a hookup to our office phones so I don't have to keep switching headsets when going from one mode (VR) to the other (Telephone). The operator hits a button to dictate or talk on the phone. Works particularly well for me. The "voice" play backs come from external speakers which the headset accommodates, solving the dilemma of not wanting to wear the headset when not using the phone or computer, (but wanting to hear the computer for email prompts, and the like). From time to time, I re-train the headset microphone. I do this when the system starts "slipping". You can tell, as soon as you get a feel for it.
The use of VTR and subject matter for my part of the debate:
1. Can you draft and compose documents more efficiently using speech recognition software or does the correction and training process actually consume more time?
Yep, I can draft and compose many times more efficiently using Speech Recognition than not using same. The machine will keep up with me no matter how fast I yap at it, and I am a pretty fast dictator. From the days of Discreet to Continuous speech recognition, there is no comparison.
Not to be surprised, one must understand that using VTR successfully takes an investment of time. But hey, I can recall a similar investment of time in learning how to use WordPerfect 5.x.
After the training of the voice, there is the input of text materials (another kind of training) which benefits the "program" by allowing the Recognition Engines to "digest" your speech patterns from a text standpoint. This process speeds up the training. I know that I "inputted" at least 25 or so lengthy documents into the Via Voice program before I started using the program in a regular manner.
I also trained the program using the "stories" provided by the program for the "Full Monty" of exercises allowed for training. The training for Dragon and VV require reading "stories" into the computer while watching a "bouncing ball" system of words appear for enunciation. If the words light up this and so, you have done it right. If they light up another way, you gotta go back and do it again to achieve the maximum effectiveness. This was not fun, and exceptionally boring, (even worse in the discreet days as opposed to the continuous speech engine days), but I consider it essential for a workable system.
2. Are the current Speech Recognition products in the market ready for prime time? Are they stable? Accurate? Easy to use?
I have experimented with Dragon and Via Voice. As noted, I started when the programs required the user to say "one word at a time", and now have progressed to "continuous" speech.
I use Via Voice primarily because it was what I had started with about three years ago, having purchased it at the ABA Legal Tech show on a promotion deal. I have since upgraded through all of the upgrades, (Simply Speaking, Via Voice, ViaVoice Gold) except the most recent, and have now layered the Typhoon Interface with the VV program.
The Typhoon interface seems to have improved the output. I don't know why, but it is obviously something the Typhoon people did with the software, in my opinion. Yet, while the Recognition got better, one gave up the ease of correction which the Via Voice system had, but was modified to a lessor correction system by the Typhoon add on. But, with accuracy being paramount, and the correction system workable in either mode, now the system works. I use it everyday. I tell people that "It is like dictating to a secretary with poor typing skills". There are errors, but one can live with them and easily correct them in less time than I could turn out a product "in the old days". My work product before voice recognition had errors too. In other words, in our office it works for me.
Here is how I use VTR. I dictate all drafts directly into the VV program. If I have the time, and am so inclined, I do the corrections. You just put your cursor on an incorrect word, click "Correct" and the program will "say" the word you recorded to allow you to hear it. You can select from a list of words for correction and click on the correct word, or simply type in the correct word. You can't voice dictate in "correct mode" using Typhoon, but as noted, you can select words from a list. So, the correction mode is whatever works for you. After I am done with the document, I send it out over the network to my secretary of 20 years, (experience with me, not age). I stress that her experience with my work habits and dictation style is very important to the success of our work product.
An example of how we put our letters together is like this: Our office uses ACT 4.0 for Docketing and Case Management. ACT contains a link with Word Perfect 8.0 and just a couple of mouse clicks from the "contact page" of an addressee (Contact) formats a letter on our Letterhead (Style) with the name and address and salutations "typed" on the letter, even before the writing begins.
Once ACT and WP do it's magic, I place the cursor at the point of where I want to start the "typing", and I start talking. I save the file, and email it to Debbie. She proofs it, and puts it in my In Box at her desk for signature. Alternatively, she will email it back to me in final form for more work or review, depending upon what we are doing as, for example, the preparation of legal documents are more back and forth than pure letters.
3. Aside from turning speech into text, what other uses, if any, does speech recognition software have?
I have experimented with "Command and Control" of my computer. After the initial novelty wore off, I have determined that I use Speech Recognition 99% for putting words on paper as fast as I can talk. I talk fast.
As an attorney who started with oral dictation in 1967, I find that voice recognition is particularly helpful to me as it fits in with the way I was trained. Except for rare circumstances at this time, I pretty well stopped the face to face verbal dictation with Debbie, who breathed a sigh of relief when I made that decision.
4. Do superior alternatives to Speech Recognition software solutions exist (e.g., Boomerang, CyberTranscriber, Mavis Beacon Teaches Typing)?
If there are superior alternatives, I don't know about them, or I would be diddling with them. I am a very fast keyboard guy, but my mouth is much faster, and this is the way I speak.
I know that because the Court Reporters are forever telling me to "slow down". I don't mind the mouse, and that is why I don't use Voice Recognition for "Command and Control", but I hate sitting at the keyboard pounding it hour after hour. In fact, I am hating this email, because I am keyboarding it at home without Voice Recognition which I do not have on my Laptop. Because of horsepower reasons (see above), I only have Voice Recognition capability at the office with my workstation.
5. What skills must one learn to become a good dictator? (No, we're not looking for answers like "Make the trains run on time and don't stockpile weapons of mass destruction.")
The Via Voice requires lots of "training". The Speech Engine has to learn the sounds of the user's voice relating to the words spoken. This takes time and experience. Those who expect to take the product from the box and use it with 95% accuracy like the advertisements claim need to re-think the concept. I did not have that result. I have been to several "shows" where VTR was being demonstrated with excellent results. Be advised, however, that demos can fool the observers unless, for example, the demonstrators "speak new stuff" to the computer.
I have seen demonstrators repeat the same speech over and over and "wow" the spectators with their prowess. Dictating "Mr. Wright writes right handed" brings "oos and ahhhs" all the time and is impressive, but after saying this to the computer twice, it knows all of the tricks. In fact, it probably comes from the factory (I have suspicioned) with that sequence in its brain. New stuff is the most difficult, and this calls for a learning curve.
Can I do Macro Stuff??. . . . . . . Both Dragon and Via Voice do Macro stuff. . You can "program" your macros in either program, just like you used to do in Word Perfect. ( I can't speak for Word for obvious reasons, but I assume the macro system is similar????) After a simple programming job, you "name" the macro, and by saying the name in the Dictate Mode, the macro does its thing. I experimented with macro usage and it worked for me. If our office did not use ACT as a case manager with the ability to link to Word Perfect and format letters, faxes, memos and stuff, I would use more of the macro features in Via Voice. Because I have a program that does it quickly, there is little sense for duplication.
In conclusion, there is no doubt that if you have the proper "equipment", and the willingness to put in the time to train your Speech Engine, you can make VTR work for you. It is not perfect. But, the speed offered to many of us who think faster than we can type makes it a strong contender for daily office use.
Update - May 2007
It has been about five years since I last wrote the article "Voice Type Recognition Software Re-Visited". During the passage of five years, an amazing transformation pertaining to DragonDictate has occurred. Rather than rewrite the article in its entirety, I will focus on the five areas of interest. A substantial amount of the previous article still pertains.
It is no longer necessary to have as quite powerful computer that one was recommended to have five or more years ago. The engineers at Dragon have figured out a way to allow the program to run on a modestly powered computer.
Having said that, I still recommend the most powerful computer one can afford so that one can multitask. In other words, whereas, in the old days one would have to shut down other programs running in order to use voice recognition programs, today one can dictate to your hearts content and keep other programs such as e-mail and office management programs up and running, all at the same time.
At the time I dictated the last article, our office was using an ACT database for tracking appointments, clients, and other management aspects. At the time of this writing, we have upgraded to a legal specific program for office management called "Time Matters". There are many features in Time Matters which can be used with voice recognition programs. Suffice to say, that voice recognition and your office management program should work together for ease of convenience and efficiency. I use macros which will bring up formatted shell letterhead when I say “Take a letter”, and allow me then to fill in the content. The addressee, Re: area, and signature block are already formatted and ready for mailing.
The feature that makes the Dragon program "bulletproof" is the addition of a voice track instituted in 2006 or so. This feature which enables the user to dictate the text and save the user's voice that is then synchronized to the text. Since my job is to put words on paper, I leave it to others to format my documents. If the person who is transcribing, proofreading and formatting my document wishes to hear my voice as it enunciates and dictates words in real time, he or she can do so. The way this is accomplished is that the typist can by highlighting with the mouse, the words of interest, and thereafter right clicking and hitting "playback" on the list that appears, and thereafter the typist can actually hear what I am dictating in my own voice, and watch the words as they appear highlit on the screen.
Training-little or none:
Some five or so years ago, in order to achieve efficiency for Dragon Dictate or any other voice recognition program the user was required to spend hours and hours training the program before it could even be used. These hours and hours of training had been supplanted by about 15 minutes of training in the newer versions in order to get the program up and running. This is truly an amazing change in the manner in which voice recognition can be initialized.
Consider the text regarding Via Voice out of date:
When I dictated this article some 5 or 6 years ago I still had familiarity with IBM's program then called Via Voice. Because of my devotion to Dragon, I have set aside any use or sampling any of the ViaVoice programs. Therefore all of my comments pertaining to ViaVoice in the last article should be disregarded as I am certain by the passing of time substantial changes have occurred to make my comments out of date.
Suffice to say at this point in time there is no longer any question as to whether voice recognition is "ready for prime time". It is unquestionably ready for any type of use in order to get words on paper. You will be happy with it, and of that, I am certain.