Character encoding in XML

Character encoding is the way of representation of the document content in a binary format. Character encoding are also called a character set, character map, code set, code page, etc... It is mandatory for XML document to have the encoding type at the beginning of the document as it supports various character sets.

The default character encoding is UTF-8 and UTF-16 character sets. It uses 1 byte to 4 byte to represent each character and is popular among web-programs.

Other character sets supported by XML are ISO-8859-1/2/3/4/5/6/7/8/9, ISO-2022-JP, Chift_JIS, EUC-JP…etc. These sets support languages such as Latin, Greek, Arabic …etc. These character sets are also called by alias names. Using standard names helps in making document widely portable.

Apart from these character encoding based on languages there are platform dependent encoding. CP1252 is one such window dependent character set which has default character set used in American and Western European PC’s. Mac uses MacRoman which is a superset of ASCII.

Of all these character sets XML is expected to support only UTF-8 and UTF-16 by default. Any character encoding conversions can be done in XML or HTML editors. They have an option of selecting the character set in which the file has to be saved.

›› go to examples ››

Character encoding in XML

Comments

About

Contact

Get Connected