Speech-Recognition Software: Finally, a computer that understands its owner
Several times over the years, I’ve experimented with speech-recognition software. The experience has always started with high expectations and ended with disappointment. But this time the technology works. It’s cranking out sentences. Subjects. Verbs. Predicates. Punctuation. Be it prose or drivel, it’s appearing on the screen as I dictate to my computer. A computer that understands spoken words. Intergalactic travel. A robot that will bring me a beer while I watch football. This is the stuff of science fiction and now one piece has come to pass.
I was rooting for the robot to get here first, but you have to admit that a computer that understands speech is pretty cool. Right now, it’s just a dictation tool, but imagine when someone figures out how to link this software to Google and other key tools.
The software I used to dictate the first draft of this column, Dragon Naturally-Speaking 9, marks a milestone in speech-recognition software. Many people who spend hours typing every day will be excited to learn that major progress is being made in the field. Throughout my newspaper career, I’ve seen colleagues crippled by too many hours at the keyboard. Some have left work on disability to rest or for surgery, while others have left the field completely.
Speech-recognition software that actually works offers hope for relief from these repetitive-stress injuries, with one caveat: The software is not yet ready for many workplaces. Naturally-Speaking 9 still requires quiet surroundings and would probably struggle in the midst of the babble and hubbub of a newsroom or other lively cubicle farm. When I used it at home, it paused and produced a little box filled with question marks every time my dogs barked in the background.
Nuance, maker of NaturallySpeaking, says the application will work without training the computer, a process in which a user reads a standard text and the computer begins to learn how the user’s voice varies from standard pronunciation. The installation process offers a choice of a short training session or launching the program without training. Frankly, I didn’t feel like writing another “speech recognition not ready for prime time” story. So I did the training, which entailed reading a couple of dozen paragraphs. After a short tutorial, I wrote the first sentence. Less than an hour had passed since I opened the box. The results were pretty remarkable. The software was very accurate, and when it did get something wrong, it was easy to go back and fix things.
I remember working with an earlier version of Dragon NaturallySpeaking: I had to read Jack London out loud for about an hour. And then, when I tried to use the software, the results were atrocious. Mistakes were difficult to fix without using a keyboard—and it made a lot of them. I never used that earlier version a second time. This time, my plan was to write a few paragraphs using the software. But it went so well that I wrote most of the first draft of this story via dictation. Using a keyboard and mouse to move the cursor around are more intuitive for me, so I edited and revised using my fingers.
While Nuance gets much of the credit for the progress, personal computer hardware manufacturers get some, too. Speech recognition requires heavy lifting by computer processors. NaturallySpeaking 9 requires a processor of at least 1 gigahertz and at least 512 megabytes of memory. As computer processing speed continues to increase, future versions from Nuance and competitors should only get better.