Speech recognition has long been the holy grail of computer data input. Or rather, we most wanted to control our computers via voice – watch episodes of Star Trek from the 1960s. The problem has always been that what we want to do with our computers does not necessarily give us voice interaction. That is not to say that it cannot be done. Mac has long had voice control, and the current incarnation macOS 10.15 Catalina is pretty good for those who trust it. The simple fact is, however, that modern computer interfaces are designed to be navigated and manipulated with a pointing device and a keyboard.
More interesting is dictation, where you create text by talking to your device instead of typing on a keyboard. (And yes, I dictated the first draft of this article.) Dictation is a skill, but it is one that many lawyers and executives used to be able to pick up. More recently, we have become accustomed to dictating short text messages using the dictation features of iOS.
Dictation in iOS is far from perfect, but when the option is to type on a small virtual keyboard, even imperfect voice input is welcome. Most frustrating is that you can not fix errors with your voice while dictating, so you either have to tolerate errors in the text or use clumsy iOS editing techniques. Once you have edited your text, you may as well have typed it from scratch.
macOS has also had dictation features for years, but it has been even less successful and less commonly used than the iOS feature, in part because it requires so much more setup than just pressing a button on a virtual keyboard.
With iOS 1
What’s wrong and right with iOS and macOS dictation
The big problem with dictation in iOS and macOS is that when it makes mistakes, there is no way to fix them. But there are other problems. To start, press a microphone button on the keyboard (iOS) or press the keyboard key twice (Mac, specified in System Preferences> Keyboard> Dictation) to start dictation. This makes sense, of course, but it does mean touching the keyboard every time you want to dictate a new message. And that in turn means that you can not just continue a conversation in Messages, say, without constant finger interaction, which defeats the purpose.
Another problem with dictation in iOS and macOS is that it only works for a certain amount of time – about 60 seconds (iOS) or 40 seconds (macOS) in my testing. As a result, you cannot dictate a document, or even more than a paragraph or two, without having to restart dictation by pressing the microphone button.
But the inability to edit oral text is the real problem. There is little more frustrating than seeing a mistake made before your eyes and knowing that there is no way to fix it before you stop dictating. And once you’ve stopped, it’s tedious to fix a bug at best, even now when you can drag the insertion point directly into iOS. iOS is just not built for text editing. Of course, it’s much easier to edit on a Mac, but you can not so much as click the mouse while dictating without stopping dictation.
On the plus side, it seems that dictation in iOS and macOS can adjust their recognition based on subsequent words you speak. You may even see it do this occasionally, changing a word back and forth between two possibilities as you continue to speak. Other times, no changes are made until you press the microphone button to start or your dictation time expires. Either way, it’s good – if a little weird – to see Apple adjust words based on context instead of recognizing brute force.
What is right and wrong with dictation of voice control
The dictation capabilities built into Apple’s new voice control system are quite different. First, instead of navigating to Settings> Accessibility> Voice control (iOS) or System Preferences> Accessibility> Voice control (macOS), you can enable voice control via Siri – just say “Hi Siri, turn on voice control.” When it’s on, when a text field or text area has an insertion point, you can simply talk to dictate text to that place. Of course, you can also speak commands, but it takes more to get used to.
Unlike standard dictation, however, the VCD remains indefinitely. You keep talking, and it keeps printing what you say in the document.
The most important benefit, however, is that you can edit the errors that VCD makes. For example, in the previous sentence, it originally capitalized the word “However.” (It is a bad habit to use words that follow commas.) However, by simply saying the words “lowercase”, I was able to solve the problem. Those who are aware will notice that the word “dog” has appeared several times in this article. How does voice control know what to fix? It prompts you by displaying numbers next to each occurrence of the word; you then speak the number of the person you want to change. It is slow but effective.
There is also another approach, although it works best on Mac. If you select text, which you can do with a finger or a keyboard on an iPhone or iPad, or with a mouse or trackpad on a Mac, you can then direct voice control to act on that text. For example, in the previous sentence, VCD did not first use the words “voice control”. It was not a mistake; I use these words in capital letters because I am talking about a specific function, but they will usually not be capitalized. Nevertheless, I can select the two words with the mouse and say “capitalize it” to achieve the desired effect. This is a surprisingly effective way to edit. It is easy and intuitive to select with the mouse and then make a change with your voice without having to move your hands back to the keyboard.
Some errors are easily solved. When I said above, “it begs you,” the VCD gave me the word “improvised.” All I had to do was say, “change improvised until it asks you to,” and Voice Control immediately resolved the error. When it works, it feels like magic, especially in iOS. When using a Mac, I prefer to select with the mouse and replace it with my voice.
Of course, there are situations where voice processing drops completely. Several times while dictating this article, I used the word “city”. VCD interpreted it as the word “I” most of the time, and no matter how I tried to edit it with my voice, the best thing I could do was say the word “goodbye” and the “delete previous character” command. Or when I wanted the word “effect” above, I ended up with “affect.” It was probably my fault for not pronouncing the word clearly enough. But when I tried to “change influence to effect”, Voice Control treated me to “eat facts” the first time and “ethernet fact” the second time. Insane! This is strange, because if I just say the word “effect” alone while emphasizing the “ee” sound at the beginning, it works well.
There are other irritations. With all the dictation, of course, you have to speak punctuation aloud, which is difficult and requires retraining of the brain a bit. If the VCD interprets a word as a plural instead of own, you can move the insertion point in front of “s” and say “apostrophe”, but will put a space in front of the apostrophe, requiring even more commands to fix the word. . And just try to get the VCD to print the word “apostrophe” or “colon” or “period” instead of punctuation.
Another problem that affects all dictation systems is the problem of homonyms. Without context, there is simply no way to distinguish between “wild” and “tree”, or “its” and “it is” or “there” and “their” and “they are”, with sound alone. VCD has no advantage here; standard dictation can make it better.
Careful allocation is important for recognizing success when working with VCD (not that it ever recognizes the word “elocution” correctly). It’s probably a good habit to get into. Many of us – myself included – whisper our words together as we speak. It’s amazing that speech recognition works at all, given how sloppy we talk.
Unfortunately, VCD does not work everywhere. On Mac, I can not get it to work in BBEdit or in Google Docs in a browser. In iOS it has fewer issues, although I’m sure I’ve met someone before. I have not tried a comprehensive overview of where it works and where it does not work, so it is enough to note that it may not always work when you want.
Another problem, primarily in iOS, is that leaving the VCD on all the time is a recipe for confusion because it will pick up others who are talking as well, or even music or other sound being played in the background. Fortunately, you can always ask Siri to “turn off voice control” to disable it. Leaving the VCD on all the time will also adversely affect battery life.
Why can we not have the best of both worlds?
It does not seem that Apple would have that much work to do to bring the best of VCD’s features to standard dictation capabilities in iOS and macOS. All that is needed is for the company to stop seeing VCD as a purely accessibility feature, instead of something that can benefit everyone.
The most important change would be to make it possible to invoke dictation easily and hold on indefinitely. In iOS, I could imagine pressing the microphone button twice, much like pressing the Shift key twice to turn on Caps Lock. If you press the Dictation key three times, you can lock it on your Mac until you turn it off again. This allows you to dictate longer pieces of text without having to leave voice control on all the time, or rely on Siri to turn it on and off.
Then all VCDs speech editing functions must migrate to the standard dictation function. I see no reason why Apple has made the VCD so much more capable in this way, and it should not be difficult to reuse the same code.
Finally, you should be able to move the insertion point around and select words as you dictate. It’s ridiculous that such an action stops dictation in iOS and macOS now.
If it sounds like I’m suggesting that Apple replace standard dictation with a form of VCD that is easier to turn on and off, that’s fine. Apart from improved word recognition by context as you continue to speak, standard dictation simply does not match VCD in almost any way.
Unfortunately, as far as I can tell in the current betas of iOS 14 and macOS 11 Big Sur, Apple has not made any significant changes in either standard dictation or VCD. So we will probably have to wait a year or more before such an improvement can see the light of day.