HTML5 documents: browsers encoding precedence
How browsers determine the character encoding of your HTML5 document
The browsers determine the character encoding of your HTML5 document in this order:
(source http://blog.whatwg.org/the-road-to-html-5-character-encoding)
- User override
- An HTTP "charset" parameter in a "Content-Type" field
- A Byte Order Mark before any other data in the HTML document itself
- A META declaration with a "charset" attribute
- A META declaration with an "http-equiv" attribute set to "Content-Type" and a value set for "charset"
- Unspecified heuristic analysis
on »video
Video demonstration Browsers encoding precedence
HTML5 Browsers encoding precedence Tutorial
min | video details |
---|---|
00:09 | ASCII versus non-ASCII characters |
00:15 | 1. User override: it doesn't matter the encoding set at document level (which is none in our case) or the one set at server level (UTF-8 in our case), the user can override it by selecting a different encoding set in Browser |
00:30 | 2. An HTTP "charset" parameter in a "Content-Type" field: e.g. browsers output: Content-Type: text/html; charset=UTF-8 |
00:35 | checking the HTTP header of the file using the W3C Internationalization Checker: url http://validator.w3.org/i18n-checker/ |
00:43 | highlighting the charset parameter in the Content-Type field in file's HTTP header |
00:53 | since the charset=ISO-8859-1, the browser picked it up as Western (ISO Latin 1 = ISO 8859-1) |
00:57 | the browser displays the file in Unicode because of the HTTP header that delivers the ISO- 8859-1 encoding and which has a higher precedence than the UTF-8 charset specified in the document through the meta charset attribute |
00:59 | the character set sent in the HTTP header (ISO-8859-1) overrides the character set specified inside the document through meta tag (UTF-8) |
01:10 | the character set sent in the HTTP header is the result of the apache's directive AddCharset ISO-8859-1 .html added inside the .htaccess file on server |
01:34 | 3. A Byte Order Mark before any other data in the HTML document itself: BOM = the unicode signature; bytes that represent the Unicode code point U+FEFF added at the beginning of a page that uses a Unicode character encoding, not visible in page but can be checked using http://validator.w3.org/i18n-checker/ |
01:41 | example of an html file encoded UTF-8 with BOM |
01:48 | test |
01:56 | test result ok: BOM UTF-8 |
02:01 | example of an html file encoded UTF-8 without BOM |
02:10 | test |
02:15 | test result ok: no BOM detected |
02:24 | 4. A META declaration with a "charset" attribute: e.g. <meta charset="UTF-8"> |
02:34 | test result ok: unicode encoded and declared at document level, Unicode decoded in browser |
02:40 | 5. A META declaration with an "http-equiv" attribute set to "Content-Type" and a value set for "charset: e.g. <meta http-equiv="content-type" content="text/html; charset=UTF-8"> |
02:55 | test result ok: unicode encoded and declared at document level, Unicode decoded in browser |