w3-video.com logo

HTML5 Tutorial

Home HTML5 XAMPP .htaccess Firefox Notepad++

Share it



HTML5 documents: browsers encoding precedence

How browsers determine the character encoding of your HTML5 document

The browsers determine the character encoding of your HTML5 document in this order:
(source http://blog.whatwg.org/the-road-to-html-5-character-encoding)

  1. User override
  2. An HTTP "charset" parameter in a "Content-Type" field
  3. A Byte Order Mark before any other data in the HTML document itself
  4. A META declaration with a "charset" attribute
  5. A META declaration with an "http-equiv" attribute set to "Content-Type" and a value set for "charset"
  6. Unspecified heuristic analysis
    on »video

Video demonstration Browsers encoding precedence

HTML5 Browsers encoding precedence Tutorial

min video details
00:09 ASCII versus non-ASCII characters
00:15 1. User override:
it doesn't matter the encoding set at document level (which is none in our case) or the one set at server level (UTF-8 in our case), the user can override it by selecting a different encoding set in Browser
00:30 2. An HTTP "charset" parameter in a "Content-Type" field:
e.g. browsers output: Content-Type: text/html; charset=UTF-8
00:35 checking the HTTP header of the file using the W3C Internationalization Checker: url
http://validator.w3.org/i18n-checker/
00:43 highlighting the charset parameter in the Content-Type field in file's HTTP header
00:53 since the charset=ISO-8859-1, the browser picked it up as Western (ISO Latin 1 = ISO 8859-1)
00:57 the browser displays the file in Unicode because of the HTTP header that delivers the ISO- 8859-1 encoding and which has a higher precedence than the UTF-8 charset specified in the document through the meta charset attribute
00:59 the character set sent in the HTTP header (ISO-8859-1) overrides the character set specified inside the document through meta tag (UTF-8)
01:10 the character set sent in the HTTP header is the result of the apache's directive AddCharset ISO-8859-1 .html added inside the .htaccess file on server
01:34 3. A Byte Order Mark before any other data in the HTML document itself:
BOM = the unicode signature; bytes that represent the Unicode code point U+FEFF added at the beginning of a page that uses a Unicode character encoding, not visible in page but can be checked using
http://validator.w3.org/i18n-checker/
01:41 example of an html file encoded UTF-8 with BOM
01:48 test
01:56 test result ok: BOM UTF-8
02:01 example of an html file encoded UTF-8 without BOM
02:10 test
02:15 test result ok: no BOM detected
02:24 4. A META declaration with a "charset" attribute:
e.g. <meta charset="UTF-8">
02:34 test result ok: unicode encoded and declared at document level, Unicode decoded in browser
02:40 5. A META declaration with an "http-equiv" attribute set to "Content-Type" and a value set for "charset:
e.g. <meta http-equiv="content-type" content="text/html; charset=UTF-8">
02:55 test result ok: unicode encoded and declared at document level, Unicode decoded in browser