Speech Input

Input Panel has speech input capabilities, which allows both dictation, in which your spoken words are converted to text, and voice commands, in which your words control menus and on-screen buttons and can switch between programs. The dictation function works similarly to the on-screen keyboard and writing pad. Place the cursor where you want the text to go, activate the dictation mode, and start speaking. After a short delay, the text appears in your document where you can correct and edit it. If the text preview pane is open, the converted text appears in the window and will be inserted in your document after you tap Send Text. The voice command function is a bit different and is discussed separately later in this section. The speech input controls appear in the speech bar below the Input Panel title bar, as shown in Figure 2-18.

Figure 2-18. The speech bar appears below the Input Panel title bar and shows whether the speech recognizer is listening for dictation or voice commands or is not listening at all.

Hardware Considerations

For speech input to work even passably well, you need a boom microphone on a headset. Make sure you get one that is designed for use on a personal computer so that it has separate plugs for the headphones and the microphone. Two other features worth the investment are a noise-canceling microphone, which reduces interference from noise in the room, and a microphone on/off switch mounted on the wire running to the computer, which lets you easily prevent accidental voice input when you answer the phone or speak to someone else in the room. At the time of this writing, Radio Shack sold a comfortable microphone with all of these features for about $20.

TIP
If the microphone jacks are in an annoying location on your tablet in portrait orientation, try using the secondary portrait orientation.

Training the System

Before you can input a single word to the tablet, you must train the system to learn your personal speech patterns. Unlike handwriting recognition, which only learns your vocabulary and not your handwriting, speech recognition actually learns how you speak and continues to improve over time. The more time you spend teaching the system the better it will be, but the initial training is required. The information is stored with your login profile, so you must log in as the same user each time when you use speech input.

How Speech Recognition Works

In many ways speech recognition is more complex than handwriting recognition because there is an even greater variety in how we speak as there is in how we write. When the speech recognition system tries to understand you it does something called pattern matching. It measures the speed, phonetics, pitch, spacing, punctuation, accent, and other aspects of your spoken words and compares them with samples in its database. As you train the speech recognition system, it narrows down the list of patterns it uses to include only the ones that closely match how you speak. This is why speech training is so important to get good results from speech input on your tablet. Without training the system, someone with a Midwest United States accent may see a speech recognition accuracy of over 95%, but the average American will get about 88% accuracy. Many native New Yorkers, Georgians, or anyone else with a stronger regional accent may see results that are much lower. With enough training of the system—and practice speaking without using “ums,” “uhs,” and excessive inflection—most people can get accuracies of 98%.

To open the speech bar on Input Panel, check Speech on the Input Panel Tools menu. The first time you do this, the mandatory Speech Training Wizard opens and guides you through adjusting your microphone and practicing speech input. The tutorial is very clear, but it takes 10 to 20 minutes to complete, so make sure you have at least half an hour for the wizard and trying speech out. Make sure the microphone is comfortable before you begin because changing the position of the microphone later could affect the quality of the voice recognition. After you have been guided through the microphone adjustment, you will read several paragraphs aloud. The words will be highlighted as they are recognized by the system, as shown in Figure 2-19. The highlighting will lag behind what you are actually reading. This is normal. If the highlighting stops, however, you must go back and start reading again beginning at the first un-highlighted word. Speak normally, but without much inflection, for the best results. The text of the first speech training discusses ways to improve the quality of speech recognition and often feels more like a sales pitch than helpful information. A more interesting list of items to read is available when performing further training. Speech recognition uses the same dictionary as handwriting recognition, so it learns your vocabulary as well as your voice.

CAUTION
Don’t let friends try out the speech function on your login profile. The system will add their voice characteristics to your profile, resulting in degraded speech recognition for you. Create a guest login profile, and let them try speech that way.

Figure 2-19. When you first open speech input, you enter a required speech training session. Optional additional sessions improve the recognition.

TIP
Position the microphone close enough to your lips so that if you pucker them in a kissing motion, you can just barely touch it. Most microphones work better if they are slightly below your lips rather than right in front of them.

Dictation

Once you complete at least the first voice training, you can input text using speech. Open a document and place the cursor where you want to input text. Next open Input Panel and make the speech bar visible. Once you tap the Dictation button, the system will show that it is listening, and everything you say, or something resembling what you say, will appear in your document. If the text preview pane is open, the text will appear there instead. Text that has been heard and is being converted appears as highlighted dots. There’s no need to wait for the dots to disappear before you continue speaking. The memory buffer holding the text is quite large, and unless you speak very quickly and never take a breath it will eventually catch up. As you speak, voice bars appear on Input Panel showing the strength and variation in your voice and indicating whether your speech is too loud, soft, or fast for optimal recognition. Figure 2‐20 shows some sample dictation.

TIP
Turn off your microphone between inputs to prevent accidental text input. If you don’t have a switch on your microphone, tap the Dictation button a second time to turn it off or say “microphone,” and it will turn off.

Figure 2-20. Dictation converts your speech and inserts it in the document or text preview pane. The speech bar provides feedback on the quality of the voice input.

Dictation Control Commands

There are several words which may or may not be converted directly into text depending on when and how you say them. For example, if the word “microphone” is said as part of a sentence, it will be converted to text, but if said as a single word, it will turn off the microphone. Similarly, “voice command” said as part of a sentence will appear as text, but said alone, it will switch speech input to voice command mode. Two other crucial commands you can use within dictation mode are “new paragraph,” which is the equivalent of the Enter key, and “new line,” which is the equivalent of Shift+Enter on your keyboard.

Spelling Out Words

If you want to spell out a word, say “spell it” and then immediately begin spelling the word. The letters will appear as you say them until you pause, at which point the system will revert to normal dictation. There is a second option called spelling mode that differs only in execution. If you say “spelling mode,” you must pause before spelling the word. When you start speaking again, the word is spelled out, and when you stop, the system returns to normal dictation. If you want to select a word that has already been converted to text, you can say “spell that.” “Spell that” selects whatever word is touching the cursor at that moment and lets you replace it by spelling it out. This is handy if you see a word translated incorrectly and you think the problem is that it is not in the dictionary rather than it was simply misunderstood.

Saying Punctuation, Numbers, and Symbols

You must say all punctuation as you go, so the sentence “He said: ‘With this device, I can rule the world!’” would be spoken “He said colon quote with this device comma I can rule the world exclamation point quote.” This system works well except for the odd sentence like: “That’s my final offer, period!” which comes out “That’s my final offer, . !” In these cases, the best thing to do is use the “spell it” command and spell out the word or correct it later. The system also automatically adds a space after a comma or a period and capitalizes the first word of a new sentence.

Numbers follow some special rules. The numbers zero through twenty will be spelled out. Numbers higher than that will appear as numeric values. If you want a number less than 21 written numerically, say “force num” and then the number. You do not need to spell out very large numbers, but you do need to say “point” or “decimal” for the decimal place. The speech “One million two hundred thirty-five thousand and thirteen point five” will appear as “1,235,013.5.”

Most symbols can simply be spoken, and they will appear correctly. The beginning of a Web address is said “http colon slash slash.” Some symbols have special ways of being said that help you get exactly what you want. The dollar sign symbol ($) is best said “dollarsign” as if it were one word. Saying “dollar…sign,” with an exaggerated pause, will result in words “dollar sign.” Table 2-5 lists some common symbols and how to say them:

Table 2-4. How to dictate some common symbols.
@	“at sign”	#	“pound sign”	$	“dollar sign”
^	“carat”	&	“ampersand’	%	“percent sign”
*	“asterisk”	--	“dash”	_	“underscore”
[	“open bracket”	(	“open parenthesis”	<	“less than”
]	“close bracket”	)	“close parenthesis”	>	“greater than”

Voice Commands

If you pause for a moment in dictation mode and say “voice command,” the system will switch to voice command mode. Voice commands have two main functions: to correct the text you input through speech and to control the tablet without a pen or a mouse.

Making Corrections

You may correct converted text with voice command, as well as using Write Anywhere, the writing pad, or the Input Panel keyboard. Voice command lets you perform the Input Panel correction functions and provides some additional capabilities. Figure 2-21 shows the sentence “The voice command mode allows you to make changes using nothing but your voice” with an obvious error. While in voice command mode, saying “select Lloyd’s” would select the incorrect text. Saying “correct Lloyd’s” both selects the text and opens the alternate word list as shown in Figure 2-22. To replace the text with an alternate word off the list, say “select” and the number on the list of the correct word. To delete or respeak the text, read the name of the desired command aloud. Saying “unselect that” would cancel the selection. Voice command also allows you to select all visible text, select after certain words in the text, select text beginning at one word and ending on another, and insert the cursor before or after a specific word. Once you have made a selection, you can also change capitalization in the sentence with your voice.

Figure 2-21. Saying or tapping “voice command” will switch you to voice command mode.

Figure 2-22. Voice command allows you to select, or select and correct text, in a single step. Input Panel displays the command it heard as it carries out the action.

What Can I Say?

Voice command does much more than give you verbal control of Input Panel. With voice command, you can open and close files, switch between applications, access the Start menu, navigate a document, and more. There are too many commands to describe them all here. To see all the commands available by voice say “What can I say?” while in voice command mode and a list similar to the one in Figure 2-23 appears. What Can I Say is context-sensitive, so the exact content of the list depends on which application you are in and even what you are doing in that application. This feature is invaluable if you use voice for anything more than dictation.

TIP
When you control a program with speech, every menu command is available even if it is not on the What Can I Say list. To control menus with speech, say the name of the menu and then the full name of the command. To see print preview, the voice command would be “File…Print Preview.” If the menu doesn’t show all the options right away, just pause longer after saying the menu name, and the rest of the menu will appear.

Figure 2-23. What Can I Say is a context-sensitive list of all the commands available in a given situation.

Speech Bar

The speech bar has its own options menu, shown in Figure 2-24, providing access to What Can I Say, speech help, and several speech input settings. If you’re having speech recognition problems, the Microphone Adjustment option is quick and can help, especially if you are in a different room than you normally use for speech input. The Voice Training option is very helpful, but each session takes fifteen to twenty minutes. If you like speech input, it’s worth your time to go through several of these sessions. You’ll get the time back in the form of fewer corrections later on.

Figure 2-24. What Can I Say, microphone settings, voice training, and custom word pronunciation are available on the speech bar.

The speech recognizer understands the pronunciations of words based on spelling and grammar rules. Sometimes what it thinks the word should sound like does not match the real word very well, especially for unusual words you added to the dictionary. You have some control over this. Choosing the Add Pronunciation For A Word option opens the window shown in Figure 2-25, containing words you added. After selecting a word, you will hear the speech recognizer’s top choice for the pronunciation. If it isn’t correct, there is an option to record the correct pronunciation. The Record Pronunciation function is a bit misleading in that it does not record your pronunciation and associate it with the word. Instead, Record Pronunciation listens to what you say and matches your pronunciation to its list of possible pronunciations based on spelling and grammar rules. If one of its alternate pronunciations is close to yours, the association is changed and the word will probably be recognized correctly in the future. If your pronunciation doesn’t match one of the alternates, then nothing happens. In my case, the correct pronunciation of Hayabasa (Hi-ya-ba-sa) doesn’t match any alternate, so if I say it during dictation it always converts as “high above the.” If I mispronounce it to match the way the recognizer thinks it should be said (Hi-yab-a-sa), it converts it to text correctly every time.

If you find it annoying to keep switching between dictation and voice command mode, you can make groups of voice commands available in dictation mode. The benefit is you don’t have to switch modes to select and edit words. The downside is your dictation may be misunderstood as a command. To add commands to dictation mode, select Voice Command Configuration, select Working With Text, as shown in Figure 2-26, and tap Details. Check only the groups of commands you want, and check the Enable During Dictation check box.

Figure 2-25. Adding a pronunciation will sometimes fix recognition errors on words you add to the dictionary.

Figure 2-26. You can enable many of the voice commands so that they are available during dictation.

Other Speech Tips

Here are a few more tips to get the most out of speech input:

Often a combination of speech, writing, and pen in place of the mouse works best for maximum productivity.
Different room acoustics, low-level background noise, and even your having a cold can interfere with speech recognition.
If you hide Input Panel, the speech bar is still visible.
Double check the choice of homonyms such as “there,” “their,” and “they’re”.
If you’re trying to control an application with speech and it isn’t working, it probably isn’t in the foreground. Tap the title bar of the application window to ensure it is the foremost application.
If you regularly use speech input in a variety of noise environments, you can create separate speech profiles for each one using the Speech control panel. This will improve recognition overall, but requires training each profile separately in the appropriate environment.

READERAID2
Even for a good typist, retyping a document can be tedious. Reading it and having Input Panel do the typing is much more fun and usually works pretty well. Find a printed document that you would like a copy of on your computer. Open a blank word processor document, place the cursor, and activate speech input in dictation mode. Read all the paragraphs, saying “new paragraph” when needed, without correcting any mistakes. Next switch to voice command mode, and save the document with voice commands, switching to dictation briefly to give it a name. Finally, try correcting the recognition mistakes with a combination of voice commands, pen taps, and writing pad inputs.