Special Characters

Level: 0

The characters between html start and end tags represent text in the ISO-Latin-1 character set, which is a superset of ASCII. Because special characters are interpreted as markup, they should be represented by entity or numeric character references.

To indicate special characters, HTML uses the representations shown in the table HTML Character Representations. These representations are always prefixed by ampersand (&) and should be followed by a semicolon, as shown. They represent particular graphic characters which have special meanings in places in the markup, or may not be part of the character set available to the writer.

For example:

When a<b, we can show that...
Brought to you by AT&T

To ensure that a string of characters has no markup, it is sufficient to represent all occurrences of <, >, and & by character or entity references.

There are SGML features (CDATA, RCDATA) to allow most <, >, and & characters to be entered without the use of entity or character references. Because these features tend to be used and implemented inconsistently, and because they require 8-bit characters to represent non-ASCII characters, they are not employed in this version of the HTML DTD.

An earlier HTML specification included an XMP element whose syntax is not expressible in SGML. Inside the XMP, no markup was recognized except the </xmp> end tag. While implementations are encouraged to support this idiom, its use is obsolete.

Table: HTML Character Representations

HTML also allows references to any of the ISO Latin-1 alphabet, using the names in the table ISO Latin-1 Character Representations, which is derived from ISO 8879:1986//ENTITIES Added Latin 1//EN.

Table: ISO Latin-1 Character Representations

Preceding Section:
Following Section: Security Considerations
Parent Section: Working with Structured Text
Contents of HyperText Markup Language