Unicode

Updated: 05/02/2021 by Computer Hope

A world-wide standard developed to help overcome the limitations of ASCII (American Standard Code for Information Interchange) released as a standard in October 1991.

With Unicode each character uses a unique number between U+0000 and U+10FFFF, Unicode may be 8-bit, 16-bit, or 32-bit. Numbers, mathematical notation, popular symbols and characters from all languages are assigned a code point, for example, U+0041 is an English letter "A." Below is an example of how "Computer Hope" would be written in English Unicode.

U+0043 U+006F U+006D U+0070 U+0075 U+0074 U+0065 U+0072 U+00A0 U+0048 U+006F U+0070 U+0065

A common type of Unicode is UTF-8, which utilizes 8-bit character encoding. It's used in Linux environments, to encode foreign characters so they display properly when output to a text file.

Tip

Microsoft Windows users can also find Unicode code points by running the character map utility.

Tip

In Microsoft Word, if you highlight a character and press the Alt+X keyboard shortcut, it displays the Unicode code for that character.

ASCII, BOM, Character, Code page, Software terms, UTF