Understanding HTML and MIME

The World Wide Web initiative (W3) links information throughout the world. To accomplish this, W3 uses the Internet Hypertext Transfer Protocol (HTTP), which allows transfer representations to be negotiated between client and server. Results are returned in a MIME body part.

HTML is one of the representations used by W3, and is proposed as a MIME content type. The definition of the HTML Content-Type is text/html, and has 3 optional parameters:

Level

The level parameter specifies the feature set which is used in the document. The level is an integer number, implying that any features of same or lower level may be present in the document. Levels are defined by this specification.

Version

To help avoid future compatibility problems, the version parameter may be used to give the version number of the specification to which the document conforms. The version number appears at the front of this document and within the public identifier for the SGML DTD.

Character sets

The charset parameter is reserved for future use. It will be used to override the base character set of the SGML declaration. Support of character sets other than ISO 8859/1 Latin alphabet No. 1 is not a requirement for conformance with this specification.

The SGML declaration specifies the base character set for HTML as ISO 8859/1, also known as Latin-1 This is the set referred to by any numeric character references. The actual character set used in the representation of an HTML document may be ISO 8859/1, or its 7-bit subset which is ISO 646 (ASCII).

Since you can represent an HTML document using either 7-bit or 8-bit characters, it is the transport medium that is most likely to pose a constraint. If you are using email to send an HTML document you should convert the document to 7-bit characters. However, the HTTP access protocol used by W3 always allows 8 bit transfer.

When an HTML document is encoded using 7-bit characters, use the mechanisms of character references and entity references to encode characters in the upper half of the ISO 8859/1 Latin-1 set.

Implementations of HTML parsers and generators can be found in W3 servers and browsers and public domain W3 code. You can also build HTML parsers and generators using various public domain SGML parsers such as [sgmls].


Preceding Section: HTML Specification
Following Section: Understanding HTML and SGML
Parent Section: HTML Specification
Contents of HyperText Markup Language