UTF-8

From DPCanadaWiki

Jump to: navigation, search
 To be created/developped and updated

UTF-8 is a way to encode Unicode character texts as a sequence of octets (or bytes or numbers between 0 and 255, inclusive) in a manner such that a pure ASCII text gets encoded as usual.

Byte values from 0 to 127, inclusive, represent the usual ASCII characters, byte values from 128 to 191, inclusive, are used to represent a block of 6 bits from a larger Unicode code number, byte values 192 and above are used as prefixes both determining how many 6 bit blocks follow and containing a couple of initial bits.

Incidentally, Latin-1 characters with Unicode numbers from 128 to 191, inclusive, are encoded as a byte with value 192 followed by (the code of) the character itself; Latin-1 characters from 192 to 255 are encoded as 193 followed by the character code minus 64.

Personal tools