Authoring tools on MS Windows, in particular MS FrontPage ("WYSIWYG" HTML editor),
generate invalid Numerical Character References for characters
commonly found in positions 128...159 (0x80...0x9f) in Windows fonts. Although
these are valid codepoints for windows-1252 (and other
windows-xxxx) charsets, valid NCRs always refer to the document character set
in the SGML sense, not to the character encoding scheme (or charset). For HTML,
the SGML document character set is fixed, it is always a subset of Unicode
(or ISO 10646). In Unicode and its iso-8859-1 subset, values 128...159 are
C1 control characters, they must not appear in HTML. Valid NCRs for the
intended characters use Unicode values greater than 256.
Lynx tries to interpret some of the invalid codes, by assuming that they are
windows-1252 codepoints.
You may want to press '\' to view the source of this test.
Code invalid NCR valid NCR, description normal in ALT
0x80 € #EURO SIGN
0x81 #NOT USED
0x82 ‚ #SINGLE LOW-9 QUOTATION MARK
0x83 ƒ #LATIN SMALL LETTER F WITH HOOK
0x84 „ #DOUBLE LOW-9 QUOTATION MARK
0x85 … #HORIZONTAL ELLIPSIS
0x86 † #DAGGER
0x87 ‡ #DOUBLE DAGGER
0x88 ˆ #MODIFIER LETTER CIRCUMFLEX ACCENT
0x89 ‰ #PER MILLE SIGN
0x8a Š #LATIN CAPITAL LETTER S WITH CARON
0x8b ‹ #SINGLE LEFT-POINTING ANGLE QUOTATION MARK
0x8c Œ #LATIN CAPITAL LIGATURE OE
0x8d #NOT USED
0x8e #NOT USED
0x8f #NOT USED
0x90 #NOT USED
0x91 ‘ #LEFT SINGLE QUOTATION MARK
0x92 ’ #RIGHT SINGLE QUOTATION MARK
0x93 “ #LEFT DOUBLE QUOTATION MARK
0x94 ” #RIGHT DOUBLE QUOTATION MARK
0x95 • #BULLET
0x96 – #EN DASH
0x97 — #EM DASH
0x98 ˜ #SMALL TILDE
0x99 ™ #TRADE MARK SIGN
0x9a š #LATIN SMALL LETTER S WITH CARON
0x9b › #SINGLE RIGHT-POINTING ANGLE QUOTATION MARK
0x9c œ #LATIN SMALL LIGATURE OE
0x9d #NOT USED
0x9e #NOT USED
0x9f Ÿ #LATIN CAPITAL LETTER Y WITH DIAERESIS