ispell

NAME

ispell - Interactive spelling checking

SYNOPSIS

ispell [-d file | -p file | -w chars | -Wn | -t | -n | -x | -b 
	 | -S | -B | -C | -P | -m | -L context | -M | -N 
	 | -T type | -V] file .....
ispell [-d file | -p file | -w chars | -Wn | -t | -n | -T type] -l
ispell [-d file | -p file | -f file | -Wn | -t | -n 
	 | -s | -B | -C | -P | -m | -T type] {-a | -A}
ispell [-d file] [-w chars | -Wn] -c
ispell [-d file] [-w chars] -e[1-4]
ispell [-d file] [-w chars] -D
ispell -v

DESCRIPTION

The ispell(1) utility is fashioned after the spell program from ITS (called ispell(1) on Twenex systems.) The most common usage is "ispell filename". In this case, ispell(1) will display each word that does not appear in the dictionary at the top of the screen and allow you to change it. If there are "near misses" in the dictionary (words that differ by only a single letter, a missing or extra letter, a pair of transposed letters, or a missing space or hyphen), they are also displayed on following lines. In addition to "near misses," ispell can display other guesses at ways to make the word from a known root, with each guess preceded by question marks. Finally, the line containing the word and the previous line are printed at the bottom of the screen. If your terminal can display in reverse video, the word itself is highlighted. You have the option of replacing the word completely, or choosing one of the suggested words. Commands are single characters as follows (case is ignored):

R
Replace the misspelled word completely.
Space
Accept the word this time only.
A
Accept the word for the rest of this ispell(1) session.
I
Accept the word, capitalized as it is in the file, and update private dictionary.
U
Accept the word, and add an uncapitalized (actually, all lowercase) version to the private dictionary.
0-n
Replace with one of the suggested words.
L
Look up words in system dictionary (controlled by the WORDS compilation option).
X
Write the rest of this file, ignoring misspellings, and start next file.
Q
Exit immediately and leave the file unchanged.
!
Shell escape.
^L
Redraw screen.
^Z
Suspend ispell.
?
Give help screen.

"Normal" mode, as well as the -l, -a, and -A options (discussed later in this topic) also accept the following "common" flags on the command line: -B, -b, -C, -d, -m, -n, -p, -S, -T, -t, -W, -w, and -x. These options are described in more detail after the option list.

The ispell(1) command takes the following options; when there are contradictory options, the last one on the command line takes precedence.

-A
Read from standard input, usually a pipe, just like -a, except that if a line begins with the string "&Include_File&", the rest of the line is taken as the name of a file to read for further words. Input returns to the original file when the include file is exhausted. Up to five levels of inclusion are allowed. If the environment variable INCLUDE_STRING is set, its value is the key string. (The ampersands, if any, must be included).
-a
Read from standard input, usually a pipe. In this mode, ispell(1) prints a one-line identification message and then reads lines of input. For each word in the line, ispell(1) writes to standard output:
*
The word was found in a dictionary.
+ root
The word was found by affix removal.
-
The word was found by compound formation (see -C).
& word n c : nearmisses guesses
The word was not in the dictionary, but there are near misses: n is the number of near misses; c is the number of characters between the beginning of the line and the misspelled word, and list is the list of near misses. Near misses are capitalized in the same manner as word unless that capitalization is not valid. If the word could be made by adding illegal derivations to a root, there can also be a list of those guesses. See the next entry for the format of guesses.
? word 0 c : guesses
There are no near misses, but there are possible derivations. Each of these guesses is displayed as:
[prefix+]root[-prefix][-suffix][+suffix]
For example, refries would display as re+fry-y+ies.
# word c
The word is misspelled, there are no near misses or derivations, and the misspelled word occurs c characters from the beginning of the line.
-B
Report run-together words with missing blanks as spelling errors. Contradicts -C.
-b
Create a back-up file by appending ".bak" to the name of the input file. Contradicts -x.
-C
Consider run-together words as legal compounds. Contradicts -B.
-d file
Specify an alternate dictionary file. For example, use -d deutsch to choose a German dictionary in a German installation.
-e [level]
Expands affix flags to produce a list of words. The optional level argument is an integer between 1 and 4 and controls how the expansion is displayed. See "Other options" later in this topic for more information.
-f outfile

Write results to a given outfile rather than to standard output. Valid only with the -a or -A options.
-L num
Use num as the number of lines of context to be shown at the bottom of the screen; the maximum value for num is subject to a system-imposed limit.
-l
Produce a list of misspelled words from the standard input.
-M
Display a one-line mini-menu at the bottom of the screen. Contradicts -N.
-m
Make possible root/affix combinations that are not in the dictionary. Contradicts -P.
-N
Display no mini-menu at the bottom of the screen.
-n
The input file is in nroff/troff format. Contradicts -t.
-P
Do not generate extra root/affix combinations. Contradicts -m.
-p file
Specify an alternate personal dictionary.
-t
The input file is in TeX or LaTeX format. Unless the -n or -T option is given, TeX mode is automatically used if the file being checked has a .tex extension. Contradicts -n.
-S
Suppress the ispell(1) utility's normal behavior of sorting the list of possible replacement words. Some users might prefer this because it slightly increases the probability that the correct word will be low-numbered.
-T type
Assume a given formatter type for all files, overriding the value determined from the file type. The type can either be a string or a file extension (defined in the dictionary's affix file); for English, they are nroff, tex, .mm, .ms, .me, .man, .NeXT, .tex, and .bib. The type can also be a supplied file suffix including the dot. tex, or it can be a supplied file extension. If no -T option appears and no type can be determined from the file name, the default string character type declared in the language affix file will be used.
-V
Display characters not in the seven-bit ANSI printable character set so they are visible. Control characters prints as ^X for control-X; the delete character (octal 0177) prints as ^?. This is the format used by cat -v. Without this switch, ispell(1) displays eight-bit characters "as is" if they have been defined as string characters for the chosen file type.
-v[v]
Print version identification on the standard output and exit. If the switch is doubled, ispell(1) prints the options that it was compiled with.
-W n
Specify length of words that are always legal. The default value for n is 1. If you want all words to be checked against the dictionary, regardless of length, you might want to specify "-W 0." If your document specifies many three-letter acronyms, however, you would specify "-W 3" to accept all words of three letters or fewer. Regardless of the setting of this option, ispell(1) will only generate words that are in the dictionary as suggested replacements for words. This prevents the list from becoming too long.

This option can miss short misspellings. If you use this option frequently, it is recommended that you check the spelling yourself on a final pass without this option before you publish your document to protect yourself from possible errors.

-w chars
Specify additional characters that can be part of a word.
-x
Do not create a back-up file. Contradicts -b; the last one on the command line takes precedence.
-z
Print to standard output all words not literally in the dictionary file. This is an Interix extension.

Nroff and Tex modes

The -n and -t options select whether ispell(1) runs in nroff/troff (-n) or TeX/LaTeX (-t) input mode. The default is controlled by the DEFTEXFLAG installation option.

In TeX/LaTeX mode, whenever a backslash (\) is found, ispell(1) will skip to the next white space or TeX/LaTeX delimiter. Certain commands contain arguments that should not be checked, such as labels and reference keys like those found in the \cite command, because they contain arbitrary, non-word arguments. Spell checking is also suppressed when in math mode. For example, given:

\chapter {This is a Ckapter}
\cite{SCH86}

ispell(1) will find "Ckapter" but not "SCH". The -t option does not recognize the TeX comment character "%", so comments are also spell-checked. It also assumes correct LaTeX syntax.Arguments to infrequently used commands and some optional arguments are sometimes checked unnecessarily. The bibliography will not be checked if ispell(1) was compiled with IGNOREBIB defined. Otherwise, the bibliography will be checked, but the reference key will not.

TeX/LaTeX mode is also automatically selected if an input file has the extension ".tex", unless overridden by the -n switch.

References for the tib(1) bibliography system, that is, text between a ''[.'' or ''<.'' and ''.]'' or ''.>'' will always be ignored in TeX/LaTeX mode.

Backup files

The -b and -x options control whether ispell(1) leaves a backup (.bak) file for each input file. The .bak file contains the precorrected text. If there are file opening / writing errors, the .bak file may be left for recovery purposes even with the -x option. The default for this option is controlled by the DEFNOBACKUPFLAG installation option.

Run-together words

The -B and -C options control how ispell(1) handles run-together words, such as "notthe" for "not the". If -B is specified, such words will be considered as errors, and ispell(1) will list variations with an inserted blank or hyphen as possible replacements. If -C is specified, run-together words will be considered to be acceptable compounds, provided that both components are in the dictionary, and each component is at least as long as a language-dependent minimum (three characters, by default). This is useful for languages such as German and Norwegian, where many compound words are formed by concatenation. (Note that compounds formed from three or more root words will still be considered errors). The default for this option is language-dependent; in a multi-lingual installation, the default might vary depending on which dictionary you choose.

Suggested spellings (guesses)

The -P and -m options control when ispell(1) automatically generates suggested root/affix combinations for possible addition to your personal dictionary. (These are the entries in the "guess" list which are preceded by question marks.) If -P is specified, such guesses are displayed only if ispell(1) cannot generate any possibilities that match the current dictionary. If -m is specified, such guesses are always displayed. This can be useful if the dictionary has a limited word list or a word list with few suffixes. You should be careful when using this option, however, as it can generate guesses that produce words that are not valid. The default for this option is controlled by the dictionary file used.

Specifying alternative dictionary files

The -p option is used to specify an alternate personal dictionary file. A personal dictionary file is simply a sorted list of words, one word to a line. If the file name does not begin with "/", the value of HOME is prefixed. Also, the shell variable WORDLIST can be set, which renames the personal dictionary in the same manner. The command line overrides any WORDLIST setting. If neither the -p switch nor the WORDLIST environment variable is given, ispell(1) will search for a personal dictionary in both the current directory and $HOME, creating one in $HOME if none is found. The preferred name is constructed by appending ".ispell_" to the base name of the hash file. For example, if you use the English dictionary, your personal dictionary would be named ".ispell_english". However, if the file ".ispell_words" exists, it will be used as the personal dictionary regardless of the language hash file chosen. This feature is included primarily for backwards compatibility.

If the -p option is not specified, ispell(1) will look for personal dictionaries in both the current directory and the home directory. If dictionaries exist in both places, they will be merged. If any words are added to the personal dictionary, they will be written to the current directory if a dictionary already existed in that place. Otherwise, they will be written to the dictionary in the home directory.

Specifying non-alphabetic characters

The -w option can be used to specify characters other than alphabetics that can also appear in words. For instance, -w "&" will allow "AT&T" to be picked up. Underscores are useful in many technical documents. There is an admittedly crude provision in this option for eight-bit international characters. Non-printing characters can be specified in the usual way by inserting a backslash (\) followed by the octal character code; for example, "\014" for a form feed. Alternatively, if "n" appears in the character string, the (up to) three characters following are a DECIMAL code 0-255, for the character. For example, to include bells and form feeds in your words, you would use:

n007n012

Numeric digits other than the three following "n" are simply numeric characters. The use of "n" does not conflict with anything because actual alphabetics have no meaning: alphabetics are already accepted. The Ispell(1) utility will typically be used with input from a file, meaning that preserving parity for possible eight-bit characters from the input text is allowed. Specifying the -l option and typing text from the terminal can create problems if your stty settings preserve parity.

Piping into ispell

The -a and -A options are intended to be used from other programs through a pipe. This mode is also suitable for interactive use when you want to figure out the spelling of a single word.

In this mode, ispell(1) prints a one-line version-identification message, and then begins reading lines of input. For each input line, a single line is written to the standard output for each word checked for spelling on the line. If the word was found in the main dictionary or your personal dictionary, the line contains only a asterisk (*). If the word was found through affix removal, the line contains a plus sign (+), a space, and the root word. If the word was found through compound formation (concatenation of two words, controlled by the -C option), the line contains only a hyphen (-).

If the word is not in the dictionary, but there are near misses, the line contains an ampersand (&), a space, the misspelled word, a space, the number of near misses, the number of characters between the beginning of the line and the beginning of the misspelled word, a colon (:), another space, and a list of the near misses separated by commas and spaces. Following the near misses (and identified only by the count of near misses), if the word could be formed by adding (illegal) affixes to a known root, is a list of suggested derivations, again separated by commas and spaces. If there are no near misses, the line format is the same, except that the ampersand (&) is replaced by a question mark (?) (and the near-miss count is always zero). The suggested derivations following the near misses are in the form:

[prefix+] root [-prefix] [-suffix] [+suffix]

(for example, "re+fry-y+ies" to get "refries") where each optional prefix and suffix is a string. Also, each near miss or guess is capitalized the same as the input word unless such capitalization is illegal. In the latter case, each near miss is capitalized correctly according to the dictionary.

If the word does not appear in the dictionary, and there are no near misses, the line contains a number-sign character (#), a space, the misspelled word, a space, and the character offset from the beginning of the line. Each sentence of text input is terminated with an additional blank line, indicating that ispell(1) has completed processing the input line.

These output lines can be summarized as follows:

OK *
Root + <root>
Compound -
Miss & <original> <count> <offset>: <miss>, <miss>, ..., <guess>, ...
Guess ? <original> 0 <offset>: <guess>, <guess>, ...
None # <original> <offset>

For example, a dummy dictionary containing the words "fray," "Frey," "fry," and "refried" might produce the following response to the command:

$ echo 'frqy refries | ispell -a -m -d ./test.hash"
(#) International Ispell Version 3.0.05 (beta), 08/10/91
& frqy 3 0: fray, Frey, fry
& refries 1 5: refried, re+fry-y+ies

When in the -a and -A modes, ispell(1) will also accept lines of single words prefixed with any of '*', '&', '@', '+', '-', '~', '#', '!', '%', or '^'. A line starting with '*' tells ispell(1) to insert the word into the user's dictionary (similar to the I command). A line starting with '&' tells ispell(1) to insert an all-lowercase version of the word into the user's dictionary (similar to the U command). A line starting with '@' causes ispell(1) to accept this word in the future (similar to the A command). A line starting with '+', followed immediately by tex or nroff will cause ispell(1) to parse future input according the syntax of that formatter. A line consisting solely of a '+' will place ispell(1) in TeX/LaTeX mode (similar to the -t option) and '-' returns ispell(1) to nroff/troff mode (but these commands are obsolete). However, string character type is not changed; the '~' command must be used to do this. A line starting with '~' causes ispell(1) to set internal parameters (in particular, the default string character type) based on the file name given in the rest of the line. (A file suffix is sufficient, but the period must be included. Instead of a file name or suffix, a unique name, as listed in the language affix file, can be specified.) However, the formatter parsing is not changed; the '+' command must be used to change the formatter. A line prefixed with '#' will cause the personal dictionary to be saved. A line prefixed with '!' will turn on terse mode (see below), and a line prefixed with '%' will return ispell(1) to normal (non-terse) mode. Any input following the prefix characters '+', '-', '#', '!', or '%' is ignored, as is any input following the file name on a '~' line. To allow spell-checking of lines beginning with these characters, a line starting with '^' has that character removed before it is passed to the spell-checking code. It is recommended that programmatic interfaces prefix every data line with an up arrow to protect against future changes in ispell(1).

To summarize these:

* Add to personal dictionary
@ Accept word, but leave out of dictionary
# Save current personal dictionary
~ Set parameters based on file name
+ Enter TeX mode
- Exit TeX mode
! Enter terse mode
% Exit terse mode
^ Spell-check rest of line

In terse mode, ispell(1) will not print lines beginning with '*', '+', or '-', all of which indicate correct words. This significantly improves running speed when the driving program is going to ignore correct words anyway.

The -s option is only valid in conjunction with the -a or -A options, and only on BSD-derived systems. If specified, ispell(1) will stop itself with a SIGTSTP signal after each line of input. It will not read more input until it receives a SIGCONT signal. This can be useful for handshaking with certain text editors.

The -f option is only valid in conjunction with the -a or -A options.

Other options

The -c, -e[1-4] and -D options of ispell(1) are intended primarily for use by the munchlist(1) shell script. The -c switch causes a list of words to be read from the standard input. For each word, a list of possible root words and affixes will be written to the standard output. Some of the root words will be illegal and must be filtered from the output by other means; the munchlist(1) script does this, as in the command:

$ echo BOTHER | ispell -c
BOTHER BOTHE/R BOTH/R

The -e switch is the reverse of -c; it expands affix flags to produce a list of words, as in the command:

$ echo BOTH/R | ispell -e
BOTH BOTHER

An optional expansion level can also be specified. A level of 1 is the same as -e alone. A level of 2 causes the original root/affix combination to be prepended to the line:

BOTH/R BOTH BOTHER

A level of 3 causes multiple lines to be output, one for each generated word, with the original root/affix combination followed by the word it creates:

BOTH/R BOTH
BOTH/R BOTHER

A level of 4 causes a floating-point number to be appended to each of the level-3 lines, giving the ratio between the length of the root and the total length of all generated words including the root:

BOTH/R BOTH 2.500000
BOTH/R BOTHER 2.500000

The -D flag causes the affix tables from the dictionary file to be dumped to standard output.

Capitalization

Unless your system administrator has suppressed the feature to save space, ispell(1) is aware of the correct capitalizations of words in the dictionary and in your personal dictionary. As well as recognizing words that must be capitalized (such as proper names) and words that must be all uppercase (such as acronyms), it can also handle words with unusual capitalization (such as "ITCorp" or "TeX"). If a word is capitalized incorrectly, the list of possibilities will include all acceptable capitalizations. (More than one capitalization might be acceptable; for example, my dictionary lists both "ITCorp" and "ITcorp.")

Although this feature is usually quite predictable, there is something of which you should be aware. If you use "I" to add a word to your dictionary that is at the beginning of a sentence (such as the first word of this paragraph if "although" were not in the dictionary), it will be marked as "capitalization required". A subsequent usage of this word without capitalization (that is, the quoted word in the previous sentence) will be considered a misspelling by ispell(1), and it will suggest the capitalized version. You must then compare the actual spellings by eye, and then type "I" to add the uncapitalized variant to your personal dictionary. You can prevent this problem, however, by using "U" to add the original word, rather than "I".

The rules for capitalization are as follows:

  1. Any word can appear in all uppercase letters, as in headings.
  2. Any word that is in the dictionary in all-lowercase form can appear either in lowercase or with only the first letter capitalized (as at the beginning of a sentence).
  3. Any word that has unusual capitalization (that is, it contains both cases, and there is an uppercase letter in addition to the first) must appear exactly as it does in the dictionary, except as permitted by the rule 1. If the word is acceptable in all-lowercase, it must appear thus in a dictionary entry.

ENVIRONMENT

DICTIONARY
Default dictionary to use if no -d flag is given.
WORDLIST
Personal dictionary file name.
INCLUDE_STRING
Code for file inclusion under the -A option.

FILES

/usr/share/spell/british.hash
Hashed dictionary for British English.
/usr/share/spell/english.hash
Hashed dictionary for American English.
$HOME/.ispell_file
User's private dictionary
.ispell_file
Directory-specific private dictionary

NOTES

The version of ispell(1) supplied with Interix does not include the scripts and tools for building dictionary files.

The original reference page lists these as bugs:

It takes several to many seconds for ispell(1) to read in the hash table, depending on size.

When all options are enabled, ispell(1) might take several seconds to generate all the guesses at corrections for a misspelled word; on slower computers this time is long enough to be annoying.

The hash table is stored as a quarter-megabyte (or larger) array, so a PDP-11 or 286 version does not seem likely.

The ispell(1) utility should understand more troff(1) syntax, and deal more intelligently with contractions.

Although small personal dictionaries are sorted before they are written out, the order of capitalizations of the same word is somewhat random.

When the -x flag is specified, ispell(1) will unlink any existing .bak file.

There are too many flags, and many of them have nonmnemonic names.

AUTHOR

Pace Willisson (pace@mit-vax), 1983, based on the PDP-10 assembly version. That version was written by R. E. Gorin in 1971, and later revised by W. E. Matson (1974) and W. B. Ackerman (1978).

Collected, revised, and enhanced for the Usenet by Walt Buehring, 1987.

Table-driven multi-lingual version by Geoff Kuenning, 1987-88.

Large dictionaries provided by Bob Devine (vianet!devine).

A complete list of contributors is too large to list here, but is distributed with the ispell sources in the file "Contributors."

VERSION

The version of ispell described in this topic is International Ispell Version 3.1.00, 10/08/93.

SEE ALSO

spell(1)

egrep(1)

join(1)

sort(1)