4/8/2010
Well-formed HTML, or XHTML, simply means HTML that conforms to
the rules of XML. This means that the same HTML tags are available,
but the stricter XML syntax is required. An XSLT style sheet is
itself XML and any HTML within it must be well formed.
In addition to HTML within an XSLT style sheet, you should
consider authoring well-formed HTML for its own sake. The industry
is moving toward well-formed HTML as a way to make the Web more
robust, while simplifying and accelerating the processing of
well-formed documents and data. Well-formed HTML has great
advantages for authoring tools and can benefit hand authoring by
ensuring that the markup is unambiguous. The industry expectation
is that a future HTML standard will be an XML application.
The price for these benefits is that a less forgiving syntax
must be used.
Writing well-formed HTML is simple. Here are the basic rules to
follow as you author or convert to well-formed HTML.
All tags must be closed
No overlapping tags are
allowed
XML does not allow start and end tags to overlap, but
enforces a strict hierarchy within the document. The following
table shows an example of these tags.
HTML |
Well-formed HTML |
|
Copy Code
|
<B>Well, <I>Hello,</B> Dolly!</I>
|
|
|
Copy Code
|
<B>Well, Hello,</B> <I>Dolly!</I>
|
|
Case matters
Choose a consistent case for start and end tags.
Generally, try to use uppercase for HTML elements. The following
table shows how case matching should appear in well-formed
HTML.
HTML |
Well-formed HTML |
|
Copy Code
|
<B><i>Hello Dolly!</I></b>
|
|
|
Copy Code
|
<B><I>Hello Dolly!</I></B>
|
|
Quote your attributes
All attribute values must be surrounded by either
single or double quotation marks. The following table shows how to
appropriately include attributes.
HTML |
Well-formed HTML |
|
Copy Code
|
<IMG src=sample.gif width=10 height=20 >
|
|
|
Copy Code
|
<IMG src="sample.gif" width="10" height="20">
|
|
Use a single root
Shortcuts that eliminate the
<HTML>element as the single top-level element are not
allowed. The following table shows how to properly include the
<HTML>element.
HTML |
Well-formed HTML |
|
Copy Code
|
<TITLE>Nonstandard markup</TITLE>
<BODY>
<P>Amazing that this HTML works.</P>
</BODY>
|
|
|
Copy Code
|
<HEAD>
<TITLE>Clean markup</TITLE>
</HEAD>
<BODY>
<P>Not nearly so amazing that
this well-formed HTML works.</P>
</BODY>
|
|
Fewer built-in entities
XML defines only the following minimal set of built-in
character entities:
- < — (<)
- > — (>)
- & — (&)
- " — (")
- ' — (')
Numeric character entities are supported.
Escape script blocks
Script blocks in HTML can contain characters that
cannot be parsed, such as
<and
&. These must be escaped in well-formed HTML by using
character entities, or by enclosing the script block in a CDATA
section.
The following table shows HTML script block that
contains both a character that cannot be parsed (<) and JScript
comments. The well-formed script block uses CDATA to encapsulate
the script.
HTML |
Well-formed HTML |
|
Copy Code
|
<SCRIPT>
// checks a number against 7
function less-than-seven(n)
{
return n < 7;
}
</SCRIPT>
|
|
|
Copy Code
|
<SCRIPT>
<![CDATA[
// checks a number against 7
function less-than-seven(n)
{
return n < 7;
}
]]>
</SCRIPT>
|
|
Not all scripts will fail if they are not escaped in
this way; however, Microsoft recommends that you do it as a matter
of habit. This ensures not only that the script will work if it
contains escaped characters or comments now, but also will continue
to work if these characters are added in the future.
In addition, Microsoft JScript® (compatible with ECMA
262 language specification) comments terminate at the end of the
line, so preserving the white space within script blocks containing
comments is important. By default, the
xml:spaceattribute value normalizes white space by
compressing adjacent white space characters into a single space.
This destroys the new line that terminates the JScript comment. Any
JScript following the comment is treated as part of the comment and
ignored, often resulting in script errors. The CDATA notation also
ensures that the white space is preserved.
See Also