UTF-8
From DPCanadaWiki
To be created/developped and updated
UTF-8 is a way to encode Unicode character texts as a sequence of octets (or bytes or numbers between 0 and 255, inclusive) in a manner such that a pure ASCII text gets encoded as usual.
Byte values from 0 to 127, inclusive, represent the usual ASCII characters, byte values from 128 to 191, inclusive, are used to represent a block of 6 bits from a larger Unicode code number, byte values 192 and above are used as prefixes both determining how many 6 bit blocks follow and containing a couple of initial bits.
Incidentally, Latin-1 characters with Unicode numbers from 128 to 191, inclusive, are encoded as a byte with value 192 followed by (the code of) the character itself; Latin-1 characters from 192 to 255 are encoded as 193 followed by the character code minus 64.
