Important:
This is retired content. This content is outdated and is no longer being maintained. It is provided as a courtesy for individuals who are still using these technologies. This content may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist.
A version of this page is also available for
4/8/2010

The Unicode standard defines codes for characters in most major languages written today. Scripts include Latin, Greek, Cyrillic, Armenian, Hebrew, Arabic, Devanagari, Gurmukhi, Gujarati, Tamil, Telugu, Kannada, Thai, Georgian, Tibetan, Japanese Kana, the complete set of modern Korean Hangul, and a unified set of Chinese/Japanese/Korean (CJK) ideographs. Several other scripts have recently been added, including Ethiopic, Canadian Syllabics, Cherokee, Sinhala, Syriac, Burmese, Khmer, and Braille.

The Unicode standard also includes punctuation marks, diacritics, mathematical symbols, technical symbols, arrows, and dingbats. It supports diacritics, which are character marks such as the tilde (~). Diacritics are used in conjunction with base characters to encode accented or vocalized letters. In all, the Unicode standard provides codes for nearly 39,000 characters from the world's alphabets, ideograph sets, and symbol collections.

In addition, there are approximately 18,000 code values reserved for future use. The Unicode standard also contains 6,400 code values that software and hardware developers can assign internally for their own characters and symbols.

Note that the Unicode Standard's surrogate, which is a pair of 16-bit Unicode code values that represent a single character, is not supported.

In This Section

Defining a Character Set

Describes the layout of code values within the Unicode standard, how those code values relate to hexadecimal and decimal values, and how those relate to the glyphs which are the visual representation of the character. Also explains composite characters.

Related Sections

Programming with Unicode and NLS

Describes the Unicode standard, and shows how to specify locales with National Language Support (NLS).